databricks / dbrx

Code examples and resources for DBRX, a large language model developed by Databricks
https://www.databricks.com/
Other
2.47k stars 231 forks source link

generate.py : tiktoken.py throws Encoding import error #3

Closed cpumaxx closed 3 months ago

cpumaxx commented 3 months ago

After setting everything up locally, both the generate.py from the github and the minimal python script on the huggingface page throw the same error. I followed all the steps in this repo and my brand new venv has been populated with "pip install -r requirements.txt"


Traceback (most recent call last):
  File "/media/models/dbrx/generate.py", line 34, in <module>
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/models/dbrx/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 822, in from_pretrained
    return tokenizer_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/models/dbrx/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2086, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/media/models/dbrx/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2325, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/tiktoken.py", line 105, in __init__
    from tiktoken import Encoding  # type: ignore (thirdParty)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: cannot import name 'Encoding' from 'tiktoken' (/media/models/dbrx/tiktoken.py)
cpumaxx commented 3 months ago

Self resolved by removing extraneous py files from the directory generate.py was in