UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.4k stars 2.49k forks source link

Error downloading multi-qa-MiniLM-L6-cos-v1 #2421

Closed moore269 closed 10 months ago

moore269 commented 10 months ago

Code causing issue model = SentenceTransformer("multi-qa-MiniLM-L6-cos-v1")

Error: model = SentenceTransformer("multi-qa-MiniLM-L6-cos-v1")\n File "C:\Users\jo\AppData\Local\anaconda3\envs\default_env\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 87, in init\n snapshot_download(model_name_or_path,\n File "C:\Users\jo\AppData\Local\anaconda3\envs\default_env\lib\site-packages\sentence_transformers\util.py", line 442, in snapshot_download\n model_info = _api.model_info(repo_id=repo_id, revision=revision, token=token)\n File "C:\Users\jo\AppData\Local\anaconda3\envs\default_env\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn\n return fn(*args, kwargs)\n File "C:\Users\jo\AppData\Local\anaconda3\envs\default_env\lib\site-packages\huggingface_hub\hf_api.py", line 1680, in model_info\n return ModelInfo(d)\nTypeError: huggingface_hub.hf_api.ModelInfo() argument after must be a mapping, not str\n', 'EntityName': 'Engine.Initialize', 'DetailedError': 'TypeError: huggingface_hub.hf_api.ModelInfo() argument after must be a mapping, not str'

OS: Windows 11

Python version: 3.10.11

pip list of packages absl-py 1.4.0 adal 1.2.7 aiohttp 3.8.5 aiosignal 1.3.1 alembic 1.13.0 antlr4-python3-runtime 4.9.3 anyio 4.2.0 applicationinsights 0.11.10 argcomplete 3.1.1 astroid 2.11.7 async-timeout 4.0.3 attrs 23.1.0 azure-ai-ml 1.12.1 azure-appconfiguration 1.1.1 azure-batch 13.0.0 azure-cli 2.50.0 azure-cli-core 2.50.0 azure-cli-telemetry 1.0.8 azure-common 1.1.28 azure-core 1.28.0 azure-cosmos 3.2.0 azure-data-tables 12.4.0 azure-datalake-store 0.0.53 azure-graphrbac 0.60.0 azure-identity 1.13.0 azure-keyvault 1.1.0 azure-keyvault-administration 4.3.0 azure-keyvault-certificates 4.7.0 azure-keyvault-keys 4.8.0b2 azure-keyvault-secrets 4.7.0 azure-loganalytics 0.1.1 azure-mgmt-advisor 9.0.0 azure-mgmt-apimanagement 4.0.0 azure-mgmt-appconfiguration 3.0.0 azure-mgmt-appcontainers 2.0.0 azure-mgmt-applicationinsights 1.0.0 azure-mgmt-authorization 3.0.0 azure-mgmt-batch 17.0.0 azure-mgmt-batchai 7.0.0b1 azure-mgmt-billing 6.0.0 azure-mgmt-botservice 2.0.0 azure-mgmt-cdn 12.0.0 azure-mgmt-cognitiveservices 13.3.0 azure-mgmt-compute 29.1.0 azure-mgmt-consumption 2.0.0 azure-mgmt-containerinstance 10.1.0 azure-mgmt-containerregistry 10.1.0 azure-mgmt-containerservice 24.0.0 azure-mgmt-core 1.4.0 azure-mgmt-cosmosdb 9.2.0 azure-mgmt-databoxedge 1.0.0 azure-mgmt-datalake-analytics 0.2.1 azure-mgmt-datalake-nspkg 3.0.1 azure-mgmt-datalake-store 0.5.0 azure-mgmt-datamigration 10.0.0 azure-mgmt-devtestlabs 4.0.0 azure-mgmt-dns 8.0.0 azure-mgmt-eventgrid 10.2.0b2 azure-mgmt-eventhub 10.1.0 azure-mgmt-extendedlocation 1.0.0b2 azure-mgmt-hdinsight 9.0.0 azure-mgmt-imagebuilder 1.2.0 azure-mgmt-iotcentral 10.0.0b2 azure-mgmt-iothub 2.3.0 azure-mgmt-iothubprovisioningservices 1.1.0 azure-mgmt-keyvault 10.2.2 azure-mgmt-kusto 0.3.0 azure-mgmt-loganalytics 13.0.0b4 azure-mgmt-managedservices 1.0.0 azure-mgmt-managementgroups 1.0.0 azure-mgmt-maps 2.0.0 azure-mgmt-marketplaceordering 1.1.0 azure-mgmt-media 9.0.0 azure-mgmt-monitor 5.0.1 azure-mgmt-msi 7.0.0 azure-mgmt-netapp 10.0.0 azure-mgmt-network 25.1.0 azure-mgmt-nspkg 3.0.2 azure-mgmt-policyinsights 1.1.0b4 azure-mgmt-privatedns 1.0.0 azure-mgmt-rdbms 10.2.0b10 azure-mgmt-recoveryservices 2.4.0 azure-mgmt-recoveryservicesbackup 6.0.0 azure-mgmt-redhatopenshift 1.2.0 azure-mgmt-redis 14.1.0 azure-mgmt-relay 0.1.0 azure-mgmt-resource 23.1.0b2 azure-mgmt-search 9.0.0 azure-mgmt-security 3.0.0 azure-mgmt-servicebus 8.2.0 azure-mgmt-servicefabric 1.0.0 azure-mgmt-servicefabricmanagedclusters 1.0.0 azure-mgmt-servicelinker 1.2.0b1 azure-mgmt-signalr 1.1.0 azure-mgmt-sql 4.0.0b10 azure-mgmt-sqlvirtualmachine 1.0.0b5 azure-mgmt-storage 21.0.0 azure-mgmt-synapse 2.1.0b5 azure-mgmt-trafficmanager 1.0.0 azure-mgmt-web 7.0.0 azure-multiapi-storage 1.2.0 azure-nspkg 3.0.2 azure-storage-blob 12.19.0 azure-storage-common 1.4.2 azure-storage-file-datalake 12.14.0 azure-storage-file-share 12.15.0 azure-synapse-accesscontrol 0.5.0 azure-synapse-artifacts 0.15.0 azure-synapse-managedprivateendpoints 0.4.0 azure-synapse-spark 0.2.0 azureml-core 1.54.0.post1 azureml-dataprep 4.11.7 azureml-dataprep-native 38.0.0 azureml-dataprep-rslex 2.18.6 azureml-dataset-runtime 1.52.0 backports.tempfile 1.0 backports.weakref 1.0.post1 banal 1.0.6 bcrypt 4.0.1 blinker 1.7.0 cachetools 5.3.2 cattrs 23.2.3 certifi 2023.5.7 cffi 1.15.1 chardet 3.0.4 charset-normalizer 3.2.0 click 8.1.6 cloudpickle 2.2.1 colorama 0.4.6 contextlib2 21.6.0 cryptography 41.0.7 dataset 1.6.2 datasets 2.14.4 defusedxml 0.7.1 Deprecated 1.2.14 dill 0.3.7 distro 1.8.0 docker 6.1.3 dotnetcore2 3.1.23 editdistance 0.6.2 et-xmlfile 1.1.0 evaluate 0.4.0 exceptiongroup 1.2.0 fabric 2.7.1 faiss-cpu 1.7.4 filelock 3.12.2 filetype 1.2.0 Flask 2.3.3 frozenlist 1.4.0 fsspec 2023.6.0 gitdb 4.0.11 GitPython 3.1.40 google-api-core 2.14.0 google-auth 2.24.0 google-search-results 2.4.1 googleapis-common-protos 1.61.0 greenlet 3.0.1 h11 0.14.0 httpcore 1.0.2 httpx 0.26.0 huggingface-hub 0.16.4 humanfriendly 10.0 idna 3.4 importlib-metadata 6.9.0 invoke 1.7.3 isodate 0.6.1 isort 5.10.1 itsdangerous 2.1.2 jaraco.classes 3.3.0 javaproperties 0.5.2 jeepney 0.8.0 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.3.1 jsondiff 2.0.0 jsonlines 4.0.0 jsonpickle 3.0.2 jsonschema 4.19.0 jsonschema-specifications 2023.7.1 keyring 24.3.0 knack 0.10.1 lazy-object-proxy 1.6.0 Mako 1.3.0 MarkupSafe 2.1.3 marshmallow 3.20.1 mccabe 0.7.0 more-itertools 10.1.0 mpmath 1.3.0 msal 1.22.0 msal-extensions 1.0.0 msrest 0.7.1 msrestazure 0.6.4 multidict 6.0.4 multiprocess 0.70.15 ndg-httpsclient 0.5.1 networkx 3.1 nltk 3.8.1 numpy 1.23.5 oauthlib 3.2.2 openai 1.8.0 opencensus 0.11.3 opencensus-context 0.1.3 opencensus-ext-azure 1.1.12 openpyxl 3.1.2 packaging 23.1 pandas 1.5.3 paramiko 3.2.0 pathlib2 2.3.7.post1 pathspec 0.12.1 Pillow 10.1.0 pip 23.1.2 pkginfo 1.9.6 platformdirs 2.5.2 portalocker 2.7.0 protobuf 4.25.1 psutil 5.9.5 pyarrow 14.0.2 pyasn1 0.5.1 pyasn1-modules 0.3.0 pycodestyle 2.9.1 pycparser 2.21 pydantic 1.10.13 pydash 6.0.2 pydocstyle 6.1.1 pyflakes 2.5.0 PyGithub 1.59.0 Pygments 2.15.1 PyJWT 2.7.0 pylama 8.4.1 pylint 2.14.5 pymsalruntime 0.13.9 PyNaCl 1.5.0 pyOpenSSL 23.2.0 pyreadline3 3.4.1 PySocks 1.7.1 python-dateutil 2.8.2 python-dotenv 1.0.0 pytz 2023.3 pywin32 306 pywin32-ctypes 0.2.2 PyYAML 6.0.1 redis 5.0.1 referencing 0.30.2 regex 2023.6.3 requests 2.31.0 requests-cache 1.1.0 requests-oauthlib 1.3.1 responses 0.18.0 rouge-score 0.1.2 rpds-py 0.9.2 rsa 4.9 ruamel.yaml 0.17.40 ruamel.yaml.clib 0.2.8 safetensors 0.3.1 scikit-learn 1.3.0 scipy 1.11.1 scp 0.13.6 SecretStorage 3.3.3 semver 2.13.0 sentence-transformers 2.2.2 sentencepiece 0.1.99 setuptools 67.8.0 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 snowballstemmer 2.2.0 SQLAlchemy 1.4.50 sshtunnel 0.1.5 strictyaml 1.7.3 sympy 1.12 tabulate 0.9.0 threadpoolctl 3.2.0 tiktoken 0.5.1 tokenizers 0.13.3 tomli 2.0.1 tomlkit 0.11.1 torch 2.0.1 torchvision 0.15.2 tqdm 4.65.0 transformers 4.31.0 typeguard 2.13.3 typing_extensions 4.8.0 tzdata 2023.3 url-normalize 1.4.3 urllib3 2.0.3 waitress 2.1.2 websocket-client 1.3.3 Werkzeug 3.0.1 wheel 0.38.4 wrapt 1.14.1 xmltodict 0.13.0 xxhash 3.3.0 yarl 1.9.2 zipp 3.17.0 zss 1.2.0

tomaarsen commented 10 months ago

Hello!

The crash originates in model_info from huggingface_hub, in these lines: https://github.com/huggingface/huggingface_hub/blob/926f6d8c7fe18d0278ad44b6580762c25fdc5795/src/huggingface_hub/hf_api.py#L2090-L2091

Surprisingly, getting the json() for the request results in a string rather than a dictionary for you. Does this crash happen every single time, and does it also occur for other Sentence Transformer models?

cc @Wauplin you may have seen this before?

Wauplin commented 10 months ago

Really weird, I haven't seen that before no :confused: I can suggest to update huggingface_hub to a newer version (0.20.2 instead of 0.16.4) with pip install -U huggingface_hub but honestly don't know why the server response is not parsed correctly (should be a requests issue but very unlikely?)

tomaarsen commented 10 months ago

I tried to look for issues with requests and its json returning a string, but I couldn't really find any. You could also try and increment the requests version, but I was using the same one as you without issues.

moore269 commented 10 months ago

Hello!

The crash originates in model_info from huggingface_hub, in these lines: https://github.com/huggingface/huggingface_hub/blob/926f6d8c7fe18d0278ad44b6580762c25fdc5795/src/huggingface_hub/hf_api.py#L2090-L2091

Surprisingly, getting the json() for the request results in a string rather than a dictionary for you. Does this crash happen every single time, and does it also occur for other Sentence Transformer models?

cc @Wauplin you may have seen this before?

  • Tom Aarsen

Thanks for looking into it. I tried with multiple Sentence Transformer models, all with the same error. I also just tried updating huggingface_hub and requests, but still same error. Let me now start with a fresh env and see if that works (maybe there is something interesting going on with one of the other packages installed in my current env)

moore269 commented 10 months ago

@tomaarsen hmm, it works on a fresh env where I just pip install sentence-transformers. So, something must be going on with my current env. I do need the other packages in that other env for the project I'm working on, so I will dig further.

OS: Windows 11

Python version: 3.10.11

pip list of packages attrs 23.1.0 azureml-dataprep 4.11.7 azureml-dataset-runtime 1.52.0 certifi 2023.11.17 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 2.2.1 colorama 0.4.6 distro 1.8.0 dotnetcore2 3.1.23 filelock 3.13.1 fsspec 2023.12.2 huggingface-hub 0.20.2 idna 3.6 Jinja2 3.1.3 joblib 1.3.2 jsonschema 4.19.0 jsonschema-specifications 2023.7.1 MarkupSafe 2.1.3 mpmath 1.3.0 networkx 3.2.1 nltk 3.8.1 numpy 1.26.3 packaging 23.2 pillow 10.2.0 pip 23.3.1 PyYAML 6.0.1 referencing 0.30.2 regex 2023.12.25 requests 2.31.0 safetensors 0.4.1 scikit-learn 1.4.0 scipy 1.11.4 sentence-transformers 2.2.2 sentencepiece 0.1.99 setuptools 68.2.2 sympy 1.12 threadpoolctl 3.2.0 tokenizers 0.15.0 torch 2.1.2 torchvision 0.16.2 tqdm 4.66.1 transformers 4.36.2 typing_extensions 4.9.0 urllib3 2.1.0 wheel 0.41.2

moore269 commented 10 months ago

another update here. I realized this issue only pops up when python unit testing. This issue is popping up in visual studio code when running the unit test. The error doesn't occur when normally running code in a python file for either environments I tested. I'm really not sure why it is only occurring in this scenario, but I decided to handle this appropriately in my unit test.

tomaarsen commented 10 months ago

That's very interesting. Could it be that some unittest monkeypatches some behaviour from requests to prevent sending real requests?

Wauplin commented 10 months ago

Or that unittest are running in a different python environment? Depending on your setup, pytest ... and python -m pytest might not use the same env.

tomaarsen commented 10 months ago

I'll close this for now, as it seems related to @moore269 their testing environment rather than SentenceTransformers itself.