Snowflake-Labs / snowflake-arctic

Apache License 2.0
511 stars 41 forks source link

ImportError: cannot import name 'LlamaTokenizer' from 'transformers.models.llama' #17

Closed AllanOricil closed 3 months ago

AllanOricil commented 4 months ago

I tried the minimum example from https://huggingface.co/Snowflake/snowflake-arctic-instruct and it did not work. Can you help me to fix it?

image

Im using the latest trasnformers release commit.

image

snowflake-arctic-instruct.py

import os
# enable hf_transfer for faster ckpt download
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from deepspeed.linear.config import QuantizationConfig

tokenizer = AutoTokenizer.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True
)
quant_config = QuantizationConfig(q_bits=8)

model = AutoModelForCausalLM.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map="auto",
    ds_quantization_config=quant_config,
    max_memory={i: "150GiB" for i in range(8)},
    torch_dtype=torch.bfloat16)

content = "5x + 35 = 7x - 60 + 10. Solve for x"
messages = [{"role": "user", "content": content}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids=input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

requirements.txt

annotated-types==0.6.0
certifi==2024.2.2
charset-normalizer==3.3.2
deepspeed==0.14.2
filelock==3.13.4
fsspec==2024.3.1
hf_transfer==0.1.6
hjson==3.1.0
huggingface-hub==0.22.2
idna==3.7
Jinja2==3.1.3
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
ninja==1.11.1.1
numpy==1.26.4
packaging==24.0
psutil==5.9.8
py-cpuinfo==9.0.0
pydantic==2.7.1
pydantic_core==2.18.2
pynvml==11.5.0
PyYAML==6.0.1
regex==2024.4.28
requests==2.31.0
safetensors==0.4.3
sympy==1.12
tokenizers==0.19.1
torch==2.3.0
tqdm==4.66.2
transformers @ git+https://github.com/huggingface/transformers@9fe3f585bb4ea29f209dc705d269fbe292e1128f
typing_extensions==4.11.0
urllib3==2.2.1
karthik-nexusflow commented 4 months ago

error can be fixed by installing the correct snowflake transformers pip install git+https://github.com/Snowflake-Labs/transformers.git@arctic

and then also install , as the llama's conditional causes this error

pip install sentencepiece
pip install tokenizers
AllanOricil commented 4 months ago

@karthik-nexusflow better to add this information here because beginners like me may try to install the official hugging face transformers package instead of the fork one, which will lead to this issue

image
jeffra commented 4 months ago

@AllanOricil, with trust_remote_code=True this should be working with public/official transformers>=4.39.0. I am testing this now in a fresh/clean environment with the version you list git+https://github.com/huggingface/transformers@9fe3f585bb4ea29f209dc705d269fbe292e1128f and I'm not able to reproduce this error for some reason :(

When did you download the weights? If you are running in an offline mode and downloaded them more than 5 days ago then trust_remote_code=True won't work and might produce this issue? This PR is what should be getting around installing the transformers fork: https://huggingface.co/Snowflake/snowflake-arctic-instruct/commit/f4ca7904b66a80b6f62d6272253ea1e32375ddd6

The core issue is confusing me though, it's saying you can't import the LlamaTokenizer but this should be available in transformers for a while now well before Arctic was introduced.

AllanOricil commented 4 months ago

@jeffra I don't even know where to use that trust variable. Is that when I run python3 script.py ? I really have no experience with python so pardon me for noob questions 😅

I just copied the simple example, created a virtual env, installed transformers 4.39.0 and deepspeed 0.14.2, then I tried to run the script with python 3, and it did not work. Got the same error that led me to open this issue.

Then I decided to go to hugging face transformers repo to get the latest release of their package, updated my virtual env with it, tried to run the code again, and again the same issue happened. Then I opened this issue here.

To get the list of dependencies I ran a command called freeze.

I have also not download any weights. Isn't that suppose to happen automatically when I ran that example code?

AllanOricil commented 4 months ago

Another question. Can I run this on a M2 Max with 32Gb ram in AWS? This was my plan 😀

sfc-gh-jrasley commented 3 months ago

We had another user run into this same issue wrt LlamaTokenizer. It appears there's a dependency on sentencepiece for this tokenizer. I've updated the requirements.txt file for inference to #25 to address this going forward.

@AllanOricil wrt to M2 Max, it's not on our exact roadmap but i believe this support was recently added in llama.cpp! :) https://github.com/ggerganov/llama.cpp/pull/7020

Closing issue for now as i think the main issue is now resolved.

AllanOricil commented 3 months ago

I will give it another chance