flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.89k stars 2.1k forks source link

I am not able load a module from my local disk after download from huggingface #2626

Closed wassimsuleiman closed 2 years ago

wassimsuleiman commented 2 years ago

i downloaded the module from huggingface https://huggingface.co/flair/ner-english-ontonotes-large/tree/main

i am trying to load the SequenceTagger tagger = SequenceTagger.load('./pytorch_model.bin')

i got: ValueError: Connection error ? Any idea

wassimsuleiman commented 2 years ago

I manage to unzip the bin file, i had pkl file and data directories. how to go from here ? i tried to load tagger = SequenceTagger.load('archive/data.pkl')

UnpicklingError: A load persistent id instruction was encountered,vbut no persistent_load function was specified.

Any idea

pg020196 commented 2 years ago

Hi @wassimsuleiman,

I recently faced a similiar problem on a machine without internet access and found the following workaround:

  1. Find a machine that has unrestricted internet access and download the model using the following code. You can find this command on the hugginface hub website clicking the button </> Use in flair
from flair.models import SequenceTagger 
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
  1. Next, find the directory in which the flair models are stored on your machine. You can do so by using the following code. Default is ~/.flair:
    import flair
    print(flair.cache_root)
  2. Navigate to that folder and copy the files to the same directory on the machine without internet access.
  3. In my case, additional cached files (tokenizer, sentencepiece model, ...) from the underlying transformer model were required to finally load the SequenceTagger. Therefore, I also copied them. The files were located at ~/.cache/huggingface/ on my computer with internet access.
  4. At the machine without internet access, set the following environment variables before trying to load the model:
    
    import os

os.environ['TRANSFORMERS_OFFLINE'] = '1' os.environ['HF_DATASETS_OFFLINE'] = '1'


6. Load the model using the code from 1.

Note: If you simply want to store the model for a later offline usage on the same computer, you can skip step 2. to 4. 

Maybe somebody can contribute a more elegant solution but for now this at least works.
helpmefindaname commented 2 years ago

Hey @pg020196 you can use the current master branch, then it will work out of the box, if you redownload the model

pg020196 commented 2 years ago

Hi @helpmefindaname, thank you for your reply! Sorry for another need for clarification on my side: If i understand your comment correctly, you are referring to the master branch of the flair project and not of the master branch of the hugging face model, aren't you? Currently, I am using the latest package available through pip (flair 0.10).

Do I still have to initially download the model using SequenceTagger.load() or does this allow me to download the pytorch_model.bin directly from the huggingface hub (https://huggingface.co/flair/ner-english-ontonotes-large/tree/main)? If I understand correctly, the current .load() method also creates a config.json file which is missing in the huggingface hub. Is this file still required when using the latest version from your master branch?

In my case, I cannot use the SequenceTagger.load() function on my target device to initially download the model since it does not have any internet access and, therefore, I have to download all the required files beforehand and copy them to a local directory on my target device.

Thank you!

helpmefindaname commented 2 years ago

Hi @pg020196 yes I refer to the master branch of flair. It doesn't matter how you download the model, however you have to once load and save it again. When saving, it will internally store a zip file containing the config/vocab/... files and use them when loading it the next time.

As you cannot do that on your target device, you need to install flair on a different device and run the load/save once there.

pg020196 commented 2 years ago

Hi @helpmefindaname, thank you for clarifying and answering my further questions. I just tried your approach and it works like a charm!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ksachdeva11 commented 2 years ago

Hi @pg020196 yes I refer to the master branch of flair. It doesn't matter how you download the model, however you have to once load and save it again. When saving, it will internally store a zip file containing the config/vocab/... files and use them when loading it the next time.

As you cannot do that on your target device, you need to install flair on a different device and run the load/save once there.

Hello @helpmefindaname @pg020196 , could you please mention the load and save steps which need to be followed once the model is downloaded?

pg020196 commented 2 years ago

Hi @ksachdeva11,

to save the model you can simply use the .save() function on the model instance. Looking at the example here, you can simply call tagger.save(filepath).

Loading a model is shown here, e.g. loading a SequenceTagger can be done by calling: SequenceTagger.load(filepath)

ksachdeva11 commented 2 years ago

Hi @pg020196 ,

I am following below steps -

  1. From machine with internet access -
from flair.models import SequenceTagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
  1. Then trying to save the model on the same machine using

tagger.save('path/to/directory/')

but getting this error - IsADirectoryError: [Errno 21] Is a directory: 'path/to/directory/'

Is it not the right way to do load and save?

pg020196 commented 2 years ago

Hi @ksachdeva11, when loading or saving the model locally, I think you have to specify path to the file and not to the directory, e. g. tagger.save('path/to/directory/tagger_model.pt') tagger = SequenceTagger.load('path/to/directory/tagger_model.pt')

When loading the model with SequenceTagger.load("flair/ner-english-ontonotes-large") the string value is used as an identifier for the model on the model hub and not a directory. See here

ksachdeva11 commented 2 years ago

thanks @pg020196 ... getting this error now while saving the model..

tagger.save('path/to/directory/tagger_model.pt')

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/flair/embeddings/base.py in _tokenizer_bytes(self)
    344             files = list(self.tokenizer.save_pretrained(temp_dir))
    345             if self.tokenizer.is_fast:
--> 346                 vocab_files = self.tokenizer.slow_tokenizer_class.vocab_files_names.values()
    347                 files = [f for f in files if all(v not in f for v in vocab_files)]
    348             zip_data = BytesIO()

AttributeError: 'NoneType' object has no attribute 'vocab_files_names'
pg020196 commented 2 years ago

@ksachdeva11 sorry, unfortunately I can't help you with this issue. Did you make sure that the model was fully and correctly loaded, meaning that you can make predictions? If not, I would recommend to check out the examples, they are really useful. Otherwise the only thing I can point out is that I tried all of the above steps with python 3.9, torch 1.11.0 and flair 0.11. Maybe there are some dependency issues with other versions. Additionally you could try another model and see if the issue still exists to narrow down the options.

ksachdeva11 commented 2 years ago

@pg020196 updating the flair version to 0.11 helped. Thanks a lot!

skwskwskwskw commented 9 months ago

Hi @wassimsuleiman,

I recently faced a similiar problem on a machine without internet access and found the following workaround:

  1. Find a machine that has unrestricted internet access and download the model using the following code. You can find this command on the hugginface hub website clicking the button </> Use in flair
from flair.models import SequenceTagger 
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
  1. Next, find the directory in which the flair models are stored on your machine. You can do so by using the following code. Default is ~/.flair:
import flair
print(flair.cache_root)
  1. Navigate to that folder and copy the files to the same directory on the machine without internet access.
  2. In my case, additional cached files (tokenizer, sentencepiece model, ...) from the underlying transformer model were required to finally load the SequenceTagger. Therefore, I also copied them. The files were located at ~/.cache/huggingface/ on my computer with internet access.
  3. At the machine without internet access, set the following environment variables before trying to load the model:
import os

os.environ['TRANSFORMERS_OFFLINE'] = '1'
os.environ['HF_DATASETS_OFFLINE'] = '1'
  1. Load the model using the code from 1.

Note: If you simply want to store the model for a later offline usage on the same computer, you can skip step 2. to 4.

Maybe somebody can contribute a more elegant solution but for now this at least works.

It's still not working and it kept trying to connect even though I have all the necessary files on the PC