Closed wassimsuleiman closed 2 years ago
I manage to unzip the bin file, i had pkl file and data directories. how to go from here ? i tried to load tagger = SequenceTagger.load('archive/data.pkl')
UnpicklingError: A load persistent id instruction was encountered,vbut no persistent_load function was specified.
Any idea
Hi @wassimsuleiman,
I recently faced a similiar problem on a machine without internet access and found the following workaround:
</> Use in flair
from flair.models import SequenceTagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
~/.flair
:
import flair
print(flair.cache_root)
~/.cache/huggingface/
on my computer with internet access.
import os
os.environ['TRANSFORMERS_OFFLINE'] = '1' os.environ['HF_DATASETS_OFFLINE'] = '1'
6. Load the model using the code from 1.
Note: If you simply want to store the model for a later offline usage on the same computer, you can skip step 2. to 4.
Maybe somebody can contribute a more elegant solution but for now this at least works.
Hey @pg020196 you can use the current master branch, then it will work out of the box, if you redownload the model
Hi @helpmefindaname, thank you for your reply! Sorry for another need for clarification on my side: If i understand your comment correctly, you are referring to the master branch of the flair project and not of the master branch of the hugging face model, aren't you? Currently, I am using the latest package available through pip (flair 0.10).
Do I still have to initially download the model using SequenceTagger.load()
or does this allow me to download the pytorch_model.bin directly from the huggingface hub (https://huggingface.co/flair/ner-english-ontonotes-large/tree/main)? If I understand correctly, the current .load()
method also creates a config.json file which is missing in the huggingface hub. Is this file still required when using the latest version from your master branch?
In my case, I cannot use the SequenceTagger.load()
function on my target device to initially download the model since it does not have any internet access and, therefore, I have to download all the required files beforehand and copy them to a local directory on my target device.
Thank you!
Hi @pg020196 yes I refer to the master branch of flair. It doesn't matter how you download the model, however you have to once load and save it again. When saving, it will internally store a zip file containing the config/vocab/... files and use them when loading it the next time.
As you cannot do that on your target device, you need to install flair on a different device and run the load/save once there.
Hi @helpmefindaname, thank you for clarifying and answering my further questions. I just tried your approach and it works like a charm!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @pg020196 yes I refer to the master branch of flair. It doesn't matter how you download the model, however you have to once load and save it again. When saving, it will internally store a zip file containing the config/vocab/... files and use them when loading it the next time.
As you cannot do that on your target device, you need to install flair on a different device and run the load/save once there.
Hello @helpmefindaname @pg020196 , could you please mention the load and save steps which need to be followed once the model is downloaded?
Hi @pg020196 ,
I am following below steps -
from flair.models import SequenceTagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
tagger.save('path/to/directory/')
but getting this error - IsADirectoryError: [Errno 21] Is a directory: 'path/to/directory/'
Is it not the right way to do load and save?
Hi @ksachdeva11,
when loading or saving the model locally, I think you have to specify path to the file and not to the directory, e. g.
tagger.save('path/to/directory/tagger_model.pt')
tagger = SequenceTagger.load('path/to/directory/tagger_model.pt')
When loading the model with SequenceTagger.load("flair/ner-english-ontonotes-large")
the string value is used as an identifier for the model on the model hub and not a directory. See here
thanks @pg020196 ... getting this error now while saving the model..
tagger.save('path/to/directory/tagger_model.pt')
~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/flair/embeddings/base.py in _tokenizer_bytes(self)
344 files = list(self.tokenizer.save_pretrained(temp_dir))
345 if self.tokenizer.is_fast:
--> 346 vocab_files = self.tokenizer.slow_tokenizer_class.vocab_files_names.values()
347 files = [f for f in files if all(v not in f for v in vocab_files)]
348 zip_data = BytesIO()
AttributeError: 'NoneType' object has no attribute 'vocab_files_names'
@ksachdeva11 sorry, unfortunately I can't help you with this issue. Did you make sure that the model was fully and correctly loaded, meaning that you can make predictions? If not, I would recommend to check out the examples, they are really useful. Otherwise the only thing I can point out is that I tried all of the above steps with python 3.9, torch 1.11.0 and flair 0.11. Maybe there are some dependency issues with other versions. Additionally you could try another model and see if the issue still exists to narrow down the options.
@pg020196 updating the flair version to 0.11 helped. Thanks a lot!
Hi @wassimsuleiman,
I recently faced a similiar problem on a machine without internet access and found the following workaround:
- Find a machine that has unrestricted internet access and download the model using the following code. You can find this command on the hugginface hub website clicking the button
</> Use in flair
from flair.models import SequenceTagger tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
- Next, find the directory in which the flair models are stored on your machine. You can do so by using the following code. Default is
~/.flair
:import flair print(flair.cache_root)
- Navigate to that folder and copy the files to the same directory on the machine without internet access.
- In my case, additional cached files (tokenizer, sentencepiece model, ...) from the underlying transformer model were required to finally load the SequenceTagger. Therefore, I also copied them. The files were located at
~/.cache/huggingface/
on my computer with internet access.- At the machine without internet access, set the following environment variables before trying to load the model:
import os os.environ['TRANSFORMERS_OFFLINE'] = '1' os.environ['HF_DATASETS_OFFLINE'] = '1'
- Load the model using the code from 1.
Note: If you simply want to store the model for a later offline usage on the same computer, you can skip step 2. to 4.
Maybe somebody can contribute a more elegant solution but for now this at least works.
It's still not working and it kept trying to connect even though I have all the necessary files on the PC
i downloaded the module from huggingface https://huggingface.co/flair/ner-english-ontonotes-large/tree/main
i am trying to load the SequenceTagger tagger = SequenceTagger.load('./pytorch_model.bin')
i got: ValueError: Connection error ? Any idea