Closed Andrei-Aksionov closed 1 month ago
Wow, you already added it 🤯🫶
There is an issue with hf-transfer
and Python 3.9.19: https://github.com/huggingface/hf_transfer/issues/33
So we need to manually merge the branch.
I see. Can you help with that @carmocca ?
I might be wrong, but it looks like there is a missing pre-build wheel for Python 3.9. So whenever you try to install this package with this Python, it tries to build a wheel. Since it's written in Rust, it expects a corresponding toolset. Don't know why before there were no issues, maybe somewhat from HF team accidentally deleted the wheel? 🤷
It looks like hf-transfer==0.1.4
(Nov 6, 2023) can be installed in Python 3.9.19 without problems.
I did tests in different virtual environments on my local machine (MacOS), but then rechecked in a Studio and it looks like the issue with non-existing wheel for Python 3.9 exists only for MacOS, not Linux. That means that the reason is somewhere else.
When I ran installation in a verbose mode, I noticed that UV cannot find a suitable version for boto3
: it starts with the newest version, checks compatibility, fails, picks an older version, check compatibility, fails, ...
In a Studio when I changed environment to Python 3.9.19 and used the same command (except for --system flag) it ran without issues. Bleeding edge ...
Can I merge this Andrei? I imagine that these CI issues can be resolved separately
Yep, go for it. It's only a CI issue. Users should not have any problems.
Something broke with the tokenizer test since the merge: https://github.com/Lightning-AI/litgpt/actions/runs/8654742013/job/23751422665#step:7:1110
Yep, I see. Weird. Now the question is how to debug it. I don't have a windows machine and don't want to use a virtual machine. Thought that github codespaces might provide different OS to choose from, but nope.
I have found the issue: in open
function the encoding wasn't specified which caused the error on a Windows machine. Don't know why none of the CI checks for this PR has failed.
@carmocca Maybe the PR with a fix should also contain with open(..., encoding='utf-8')
wherever it's not specified? I just not hugely confident with encodings. As I understand it should be save to specify this encoding for all non-binary files.
I agree. It should always be added for Windows, where the default is different
Hi there 👋
Google released three variants of Gemma model for code generation:
2b
,7b
and7b-it
2b
and7b
variants are useful only for code completion, they require a special prompt and the output is not the best (see examples here).7b-it
is a much better model, more versatile and generates plausible outputs. Thus this PR adds only this model.If there is a strong intent to also include
2b
and7b
variants, I recommend to first add support for a custom prompt (provided as an argument).