Open Founce opened 2 months ago
got the same problem
@Founce @sori424 Thank you for raising this issue. I looked at the problem and it is due to the recent update of the torch 2.4. Please use an earlier version of torch (2.2 or 2.3) and it should work. I have updated the README file with the specific version details. Please refer to the latest README and try again.
If this resolves the issue, I’ll go ahead and close it. Otherwise, feel free to provide further feedback, and I’ll continue to assist.
@zzn-nzz Hi there,
Thanks for your prompt response. I tried implementing your solution but encountered an additional error.
This is the error message I received:
Traceback (most recent call last):
File "save_reps.py", line 118, in <module>
main()
File "save_reps.py", line 113, in main
save_condition(args.model_name, args.temperature,
File "save_reps.py", line 40, in save_condition
llm = LM_nnsight(model_path=paths[model_name])
File "/RepBelief/LM_hf.py", line 37, in __init__
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 825, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 133, in __init__
super().__init__(
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3
@Founce I haven't met such an error when running the listed commands to create a new environment and running the codes under the environment.
Could you check if upgrading the 'tokenizer' library resolves the problem? (https://github.com/huggingface/transformers/issues/31789)
If not, could you provide me with more details, e.g., the versions of the library? Thanks
@zzn-nzz Thanks for your prompt reply. Following your suggestion, I upgraded the "tokenizer" library from 0.15.2 to 0.20.0. I got the error message of conflict between "transformers" and "tokenizer": ImportError: tokenizers>=0.14,<0.19 is required for a normal functioning of this module, but found tokenizers==0.20.0.
The versions of the library I am currently using are as follows:
Package Version
------------------- -----------
accelerate 0.34.2
annotated-types 0.7.0
asttokens 2.4.1
backcall 0.2.0
bidict 0.23.1
Brotli 1.0.9
certifi 2024.8.30
charset-normalizer 3.3.2
contourpy 1.1.1
cycler 0.12.1
decorator 5.1.1
diffusers 0.30.3
einops 0.8.0
executing 2.1.0
filelock 3.13.1
fonttools 4.54.1
fsspec 2024.9.0
gmpy2 2.1.2
h11 0.14.0
huggingface-hub 0.25.1
idna 3.7
importlib_metadata 8.5.0
importlib_resources 6.4.5
ipdb 0.13.13
ipython 8.12.3
jedi 0.19.1
Jinja2 3.1.4
kiwisolver 1.4.7
MarkupSafe 2.1.3
matplotlib 3.7.5
matplotlib-inline 0.1.7
mkl-fft 1.3.8
mkl-random 1.2.4
mkl-service 2.4.0
mpmath 1.3.0
networkx 3.1
nnsight 0.1.18
numpy 1.24.3
packaging 24.1
pandas 2.0.3
parso 0.8.4
pexpect 4.9.0
pickleshare 0.7.5
pillow 10.4.0
pip 24.2
prompt_toolkit 3.0.48
protobuf 5.28.2
psutil 6.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
pydantic 2.9.2
pydantic_core 2.23.4
Pygments 2.18.0
pyparsing 3.1.4
PySocks 1.7.1
python-dateutil 2.9.0.post0
python-engineio 4.9.1
python-socketio 5.11.4
pytz 2024.2
PyYAML 6.0.1
regex 2024.9.11
requests 2.32.3
safetensors 0.4.5
scipy 1.10.1
seaborn 0.13.2
sentencepiece 0.2.0
setuptools 75.1.0
simple-websocket 1.0.0
six 1.16.0
stack-data 0.6.3
sympy 1.13.2
tokenizers 0.15.2
tomli 2.0.1
torch 2.3.1
torchaudio 2.3.1
torchvision 0.18.1
tqdm 4.66.5
traitlets 5.14.3
transformers 4.38.1
triton 2.3.1
typing_extensions 4.11.0
tzdata 2024.2
urllib3 2.2.2
wcwidth 0.2.13
websocket-client 1.8.0
wheel 0.44.0
wsproto 1.2.0
zipp 3.20.2
@Founce My tokenizers version is 0.15.2. Could you try installing this specific version?
@zzn-nzz Thank you for your response. Actually, the error I mentioned above is from version 0.15.2. I have also attempted to upgrade it, but it did not work. Could you provide a full list of library versions that might be compatible?
Here is the traceback for reference:
Traceback (most recent call last):
File "save_reps.py", line 118, in <module>
main()
File "save_reps.py", line 113, in main
save_condition(args.model_name, args.temperature,
File "save_reps.py", line 40, in save_condition
llm = LM_nnsight(model_path=paths[model_name])
File "/RepBelief/LM_hf.py", line 37, in __init__
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 825, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 133, in __init__
super().__init__(
File "/mambaforge/envs/llm/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3
@Founce Here is the list.
--------------------------------- ------------
accelerate 0.34.2
aiobotocore 2.12.3
aiohttp 3.9.5
aioitertools 0.7.1
aiosignal 1.2.0
alabaster 0.7.12
altair 5.0.1
annotated-types 0.7.0
anyio 4.2.0
appdirs 1.4.4
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
astroid 2.14.2
astropy 5.1
asttokens 2.0.5
async-lru 2.0.4
async-timeout 4.0.3
atomicwrites 1.4.0
attrs 23.1.0
Automat 20.2.0
autopep8 2.0.4
Babel 2.11.0
backcall 0.2.0
bcrypt 3.2.0
beautifulsoup4 4.12.3
bidict 0.23.1
binaryornot 0.4.4
black 24.4.2
bleach 4.1.0
blinker 1.6.2
bokeh 2.4.3
botocore 1.34.69
Bottleneck 1.3.7
Brotli 1.0.9
cachetools 5.3.3
certifi 2024.6.2
cffi 1.16.0
chardet 4.0.0
charset-normalizer 2.0.4
click 8.1.7
cloudpickle 2.2.1
colorama 0.4.6
colorcet 3.1.0
comm 0.2.1
constantly 23.10.4
contourpy 1.0.5
cookiecutter 2.6.0
cryptography 42.0.5
cssselect 1.2.0
cycler 0.11.0
cytoolz 0.12.2
dask 2023.4.1
datashader 0.15.1
datashape 0.5.4
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
diff-match-patch 20200713
diffusers 0.30.3
dill 0.3.8
distributed 2023.4.1
docstring-to-markdown 0.11
docutils 0.18.1
einops 0.8.0
entrypoints 0.4
et-xmlfile 1.1.0
exceptiongroup 1.2.0
executing 0.8.3
fastjsonschema 2.16.2
filelock 3.13.1
flake8 7.0.0
Flask 3.0.3
fonttools 4.51.0
frozenlist 1.4.0
fsspec 2024.3.1
gensim 4.3.2
gitdb 4.0.7
GitPython 3.1.37
gmpy2 2.1.2
greenlet 3.0.1
h11 0.14.0
h5py 3.11.0
HeapDict 1.0.1
holoviews 1.17.1
huggingface-hub 0.25.1
hvplot 0.10.0
hyperlink 21.0.0
idna 3.7
imagecodecs 2023.1.23
imageio 2.33.1
imagesize 1.4.1
imbalanced-learn 0.12.3
importlib-metadata 7.0.1
importlib-resources 6.1.1
incremental 22.10.0
inflection 0.5.1
iniconfig 1.1.1
intake 0.7.0
intervaltree 3.1.0
ipdb 0.13.13
ipykernel 6.28.0
ipython 8.12.2
ipython-genutils 0.2.0
ipywidgets 7.8.1
isort 5.13.2
itemadapter 0.3.0
itemloaders 1.1.0
itsdangerous 2.2.0
jaraco.classes 3.2.1
jedi 0.18.1
jeepney 0.7.1
jellyfish 1.0.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
json5 0.9.6
jsonschema 4.19.2
jsonschema-specifications 2023.7.1
jupyter 1.0.0
jupyter_client 8.6.0
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.0
jupyter_server 2.14.1
jupyter_server_terminals 0.4.4
jupyterlab 4.0.11
jupyterlab-pygments 0.1.2
jupyterlab_server 2.25.1
jupyterlab-widgets 1.0.0
keyring 24.3.1
kiwisolver 1.4.4
lazy_loader 0.4
lazy-object-proxy 1.10.0
lckr_jupyterlab_variableinspector 3.1.0
llvmlite 0.41.0
lmdb 1.4.1
locket 1.0.0
lxml 5.2.1
Markdown 3.4.1
markdown-it-py 2.2.0
MarkupSafe 2.1.3
matplotlib 3.7.2
matplotlib-inline 0.1.6
mccabe 0.7.0
mdurl 0.1.0
mistune 2.0.4
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
more-itertools 10.1.0
mpmath 1.3.0
msgpack 1.0.3
multidict 6.0.4
multipledispatch 0.6.0
mypy 1.10.0
mypy-extensions 1.0.0
nbclient 0.8.0
nbconvert 7.10.0
nbformat 5.9.2
nest-asyncio 1.6.0
networkx 3.1
nltk 3.8.1
nnsight 0.1.18
notebook 7.0.8
notebook_shim 0.2.3
numba 0.58.1
numexpr 2.8.4
numpy 1.24.3
numpydoc 1.5.0
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.1.105
nvidia-nvtx-cu12 12.1.105
openpyxl 3.1.2
overrides 7.4.0
packaging 23.2
pandas 2.0.3
pandocfilters 1.5.0
panel 0.14.3
param 1.13.0
parsel 1.8.1
parso 0.8.3
partd 1.4.1
pathspec 0.10.3
patsy 0.5.6
pexpect 4.8.0
pickleshare 0.7.5
pillow 10.3.0
pip 24.0
pkgutil_resolve_name 1.3.10
platformdirs 3.10.0
plotly 5.22.0
pluggy 1.0.0
ply 3.11
prometheus-client 0.14.1
prompt-toolkit 3.0.43
Protego 0.1.16
protobuf 3.20.3
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 14.0.2
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.11.1
pycparser 2.21
pyct 0.5.0
pycurl 7.45.2
pydantic 2.9.2
pydantic_core 2.23.4
pydeck 0.8.0
PyDispatcher 2.0.5
pydocstyle 6.3.0
pyerfa 2.0.0
pyflakes 3.2.0
Pygments 2.15.1
pylint 2.16.2
pylint-venv 3.0.3
pyls-spyder 0.4.0
pyodbc 5.0.1
pyOpenSSL 24.0.0
pyparsing 3.0.9
PyQt5 5.15.10
PyQt5-sip 12.13.0
PyQtWebEngine 5.15.6
PySocks 1.7.1
pytest 7.4.4
python-dateutil 2.9.0.post0
python-engineio 4.9.1
python-json-logger 2.0.7
python-lsp-black 2.0.0
python-lsp-jsonrpc 1.1.2
python-lsp-server 1.10.0
python-slugify 5.0.2
python-snappy 0.6.1
python-socketio 5.11.4
pytoolconfig 1.2.6
pytz 2024.1
pyviz_comms 3.0.2
PyWavelets 1.4.1
pyxdg 0.27
PyYAML 6.0.1
pyzmq 25.1.2
QDarkStyle 3.2.3
qstylizer 0.2.2
QtAwesome 1.2.2
qtconsole 5.5.1
QtPy 2.4.1
queuelib 1.6.2
referencing 0.30.2
regex 2023.10.3
requests 2.32.2
requests-file 1.5.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.3.5
rope 1.12.0
rpds-py 0.10.6
Rtree 1.0.1
s3fs 2024.3.1
safetensors 0.4.5
scikit-image 0.20.0
scikit-learn 1.3.0
scipy 1.9.1
Scrapy 2.11.1
seaborn 0.13.2
SecretStorage 3.3.1
Send2Trash 1.8.2
sentencepiece 0.2.0
service-identity 18.1.0
setuptools 69.5.1
simple-websocket 1.0.0
sip 6.7.12
six 1.16.0
smart-open 5.2.1
smmap 4.0.0
sniffio 1.3.0
snowballstemmer 2.2.0
sortedcontainers 2.4.0
soupsieve 2.5
Sphinx 5.0.2
sphinxcontrib-applehelp 1.0.2
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.0
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
spyder 5.5.1
spyder-kernels 2.5.0
SQLAlchemy 2.0.30
stack-data 0.2.0
statsmodels 0.14.0
streamlit 1.32.0
sympy 1.12
tables 3.8.0
tabulate 0.9.0
tblib 1.7.0
tenacity 8.2.2
terminado 0.17.1
text-unidecode 1.3
textdistance 4.2.1
threadpoolctl 2.2.0
three-merge 0.1.1
tifffile 2023.4.12
tinycss2 1.2.1
tldextract 3.2.0
tokenizers 0.15.2
toml 0.10.2
tomli 2.0.1
tomlkit 0.11.1
toolz 0.12.0
torch 2.3.1+cu121
torchaudio 2.3.1+cu121
torchvision 0.18.1+cu121
tornado 6.4.1
tqdm 4.66.4
traitlets 5.14.3
transformers 4.38.1
triton 2.3.1
Twisted 23.10.0
typing_extensions 4.11.0
tzdata 2023.3
ujson 5.10.0
unicodedata2 15.1.0
Unidecode 1.2.0
urllib3 1.26.19
w3lib 2.1.2
watchdog 4.0.1
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.3
whatthepatch 1.0.2
wheel 0.43.0
widgetsnbextension 3.6.6
wrapt 1.14.1
wsproto 1.2.0
wurlitzer 3.0.2
xarray 2022.11.0
yapf 0.40.2
yarl 1.9.3
zict 3.0.0
zipp 3.17.0
zope.interface 5.4.0
By the way, which model are you using?
@zzn-nzz Thanks for your reply. I will try the version list you provided. The model name is Mistral-7B-Instruct-v0.2
.
Hi,
I am encountering an issue while trying to run your code. I have followed the setup instructions and installed the necessary packages, but I am receiving a RuntimeError related to duplicate registrations.
Here are the steps I took to set up my environment:
The error is
And my pytorch version is 2.4.1, CUDA version is 12.2.
According to the error message, there should be a conflict between nnsight and torch. Have you encountered similar problems or can you provide a complete list of requirements?