Closed ASDpaper closed 1 year ago
Yes! I'm running into the same issue.
Sentence-transformers models - do not run with the usual AutoTokenizer.from_pretrained(); they instead need to import a different class (SentenceTransformer(model_name))
So I think the code base needs to be updated to incorporate that change. THe latest version of HuggingFace doesn't support that.
The same here. Looks like not an exceptional issue.
The fix is:
In the src/models.py : you can replace this line:
minilm_model_name = "sentence-transformers/all-MiniLM-L6-v2"
with this:
minilm_model_name = "obrizum/all-MiniLM-L6-v2"
and then in your terminal run:
semantra --model minilm
The fix is:
In the src/models.py : you can replace this line:
minilm_model_name = "sentence-transformers/all-MiniLM-L6-v2"
with this:
minilm_model_name = "obrizum/all-MiniLM-L6-v2"
and then in your terminal run:
semantra --model minilm
Thank you very much!
See https://github.com/freedmand/semantra/issues/32#issuecomment-1540149884
This is not an issue with Semantra, but rather the services that host the models. The default model should work if you try again later, once the status of Huggingface and Github show that they are operational.
In the src/models.py : you can replace this line:
minilm_model_name = "sentence-transformers/all-MiniLM-L6-v2"
with this:
minilm_model_name = "obrizum/all-MiniLM-L6-v2"
For future reference, you can do this without any code changes by passing --transformer-model
. See https://github.com/freedmand/semantra/blob/main/docs/guide_models.md#using-custom-models
Hello, I am encountering an error while trying to use the model "sentence-transformers/all-mpnet-base-v2" in a script.
(base) PS F:\Download> semantra hamlet.pdf Traceback (most recent call last): File "E:\software\Anaconnda\lib\site-packages\huggingface_hub\utils_errors.py", line 259, in hf_raise_for_status response.raise_for_status() File "E:\software\Anaconnda\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "E:\software\Anaconnda\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file resolved_file = hf_hub_download( File "E:\software\Anaconnda\lib\site-packages\huggingface_hub\utils_validators.py", line 120, in _inner_fn return fn(*args, *kwargs) File "E:\software\Anaconnda\lib\site-packages\huggingface_hub\file_download.py", line 1195, in hf_hub_download metadata = get_hf_file_metadata( File "E:\software\Anaconnda\lib\site-packages\huggingface_hub\utils_validators.py", line 120, in _inner_fn return fn(args, **kwargs) File "E:\software\Anaconnda\lib\site-packages\huggingface_hub\file_download.py", line 1541, in get_hf_file_metadata hf_raise_for_status(r) File "E:\software\Anaconnda\lib\site-packages\huggingface_hub\utils_errors.py", line 291, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-64599c33-4e9b1c4a0839489551a6eee6)
Repository Not Found for url: https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/tokenizer_config.json. Please make sure you specified the correct
repo_id
andrepo_type
. If you are trying to access a private or gated repo, make sure you are authenticated.During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "E:\software\Anaconnda\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\software\Anaconnda\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\software\Anaconnda\Scripts\semantra.exe__main.py", line 7, in
File "E:\software\Anaconnda\lib\site-packages\click\core.py", line 1130, in call
return self.main(*args, kwargs)
File "E:\software\Anaconnda\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "E:\software\Anaconnda\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, ctx.params)
File "E:\software\Anaconnda\lib\site-packages\click\core.py", line 760, in invoke
return callback(*args, kwargs)
File "E:\software\Anaconnda\lib\site-packages\semantra\semantra.py", line 598, in main
model: BaseModel = model_config["get_model"]()
File "E:\software\Anaconnda\lib\site-packages\semantra\models.py", line 334, in
"get_model": lambda: TransformerModel(model_name=mpnet_model_name),
File "E:\software\Anaconnda\lib\site-packages\semantra\models.py", line 166, in init
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
File "E:\software\Anaconnda\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 642, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, kwargs)
File "E:\software\Anaconnda\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 486, in get_tokenizer_config
resolved_config_file = cached_file(
File "E:\software\Anaconnda\lib\site-packages\transformers\utils\hub.py", line 424, in cached_file
raise EnvironmentError(
OSError: sentence-transformers/all-mpnet-base-v2 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with
use_auth_token
or log in withhuggingface-cli login
and passuse_auth_token=True
.I have attempted to resolve the issue by ensuring I am using the correct model identifier and checking my internet access. I have also tried logging in with the Hugging Face CLI before running the script.
However, the error persists. Any assistance in resolving this issue would be greatly appreciated.
Environment information: