Closed mclanza closed 2 months ago
Hi Marie,
We integrated OLMo directly into the transformers
library recently, which caused us to change the checkpoints to use. You will probably want to use allenai/OLMo-7B-hf
instead, or allenai/OLMo-1.7-7B-hf
if you would like to use our recently-released improved 7B model. I have updated the README with the new information: https://github.com/allenai/OLMo/pull/589. Sorry for the inconvenience.
Thanks @2015aroras !
I tried it with the following code
from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf", trust_remote_code=True)
message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
and it hangs after loading checkpoint shards
I tried it myself and some colleagues tried it too, on different machines, different environments and different networks. Any idea that could help us?
Thanks!
python --version && pip freeze
Python 3.9.6
ai2-olmo==0.3.0
antlr4-python3-runtime==4.9.3
boto3==1.34.111
botocore==1.34.111
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
filelock==3.13.4
fsspec==2024.5.0
google-api-core==2.19.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
huggingface-hub==0.21.4
idna==3.7
Jinja2==3.1.4
jmespath==1.0.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
omegaconf==2.3.0
packaging==24.0
pillow==10.3.0
proto-plus==1.23.0
protobuf==4.25.3
pyasn1==0.6.0
pyasn1_modules==0.4.0
Pygments==2.18.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.2
rich==13.7.1
rsa==4.9
s3transfer==0.10.1
safetensors==0.4.3
six==1.16.0
sympy==1.12
tokenizers==0.19.1
torch==2.2.2
torchaudio==2.2.2
torchvision==0.17.2
tqdm==4.66.4
transformers==4.40.2
typing_extensions==4.11.0
urllib3==1.26.18
It hangs here:
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
Code:
from transformers import AutoModelForCausalLM, AutoTokenizer
print('Step 1...')
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
print('Step 2...')
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")
message = ["Language modeling is "]
print('Step 3...')
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
print('Step 4...')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print('Step 5...')
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
Result:
The same happens with allenai/OLMo-7B-hf
and with/withour argument trust_remote_code=True
.
The trust_remote_code=True
is not needed for our -hf
models.
Without looking too deeply, my guess is that this is trying to run on CPUs (since nothing is telling it to run on GPUs). The simplest solution would be to pass , device_map="auto"
to AutoModelForCausalLM.from_pretrained
(and pip installing accelerate
if needed). You may also add a .to("cuda")
to the end of your inputs to move them to GPUs too. Other than that, you can set max_new_tokens=1
just to see if things are hanging or the model is just being slow.
Aside: the model might not fit on just 1 GPU.
Exactly! Thanks for your reply! It took much more time (hours) but it finally worked. I'll try your suggestions. Thanks again!
Thanks for the detailed replies. It worked ok now. I'm closing the issue. Thanks again!
🐛 Describe the bug
Following instructions here https://github.com/allenai/OLMo used to work some days ago. I tried it yesterday and today again and it throws the following warning/error
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
and no results are shown as it used to show.This is the code that I'm trying to run
I also tried with
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B", trust_remote_code=True)
but same message is shown.I also tried
allenai/olmo-7b
(lowercase) and same message is shown.What am I missing? What could have changed since the last time that it worked?
Thanks!
Versions
python --version && pip freeze
Python 3.9.6 ai2-olmo==0.3.0 antlr4-python3-runtime==4.9.3 boto3==1.34.107 botocore==1.34.107 cached_path==1.6.2 cachetools==5.3.3 certifi==2024.2.2 charset-normalizer==3.3.2 filelock==3.13.4 fsspec==2024.5.0 google-api-core==2.19.0 google-auth==2.29.0 google-cloud-core==2.4.1 google-cloud-storage==2.16.0 google-crc32c==1.5.0 google-resumable-media==2.7.0 googleapis-common-protos==1.63.0 huggingface-hub==0.21.4 idna==3.7 Jinja2==3.1.4 jmespath==1.0.1 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mpmath==1.3.0 networkx==3.2.1 numpy==1.26.4 omegaconf==2.3.0 packaging==24.0 pillow==10.3.0 proto-plus==1.23.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 Pygments==2.18.0 python-dateutil==2.9.0.post0 PyYAML==6.0.1 regex==2024.5.15 requests==2.31.0 rich==13.7.1 rsa==4.9 s3transfer==0.10.1 safetensors==0.4.3 six==1.16.0 sympy==1.12 tokenizers==0.19.1 torch==2.0.0 torchaudio==2.0.1 torchvision==0.15.1 tqdm==4.66.4 transformers==4.40.2 typing_extensions==4.11.0 urllib3==1.26.18