adobe / NLP-Cube

Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
http://opensource.adobe.com/NLP-Cube/index.html
Apache License 2.0
552 stars 93 forks source link

Problem with model loading #140

Closed sorinsfirlogea closed 10 months ago

sorinsfirlogea commented 10 months ago

I have suucessfully installed NLPCube on an Ubuntu 20.04 box, running Python 3.9. I copied the example from the manifest file:

from cube.api import Cube
cube=Cube(verbose=True)
cube.load("en")
text="All the faith he had had, had had no effect on the outcome of his life."
sentences=cube(text)
for sentence in sentences:
  for entry in sentence:
    print(str(entry.index)+"\t"+entry.word+"\t"+entry.lemma+"\t"+entry.upos+"\t"+entry.xpos+"\t"+entry.attrs+"\t"+str(entry.head)+"\t"+str(entry.label)+"\t"+entry.space_after)
  print("")

At first call it loaded the English model (~/.nlpcube/3.0/en). When it comes to running the script I consistently encounter the following error:

root@sfirlogea:~# ./mycube.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/cube/api.py", line 192, in _load
    return CubeObj('{0}/{1}'.format(lang_path, lang), device=device, lang=lang)
  File "/usr/local/lib/python3.9/dist-packages/cube/api.py", line 71, in __init__
    self._tokenizer_collate = TokenCollateFTLanguasito(encodings,
  File "/usr/local/lib/python3.9/dist-packages/cube/networks/utils_tokenizer.py", line 128, in __init__
    self._lm_helper = LMHelperFT(device=lm_device, model=parts[1])
  File "/usr/local/lib/python3.9/dist-packages/cube/networks/lm.py", line 48, in __init__
    self._fasttext = fasttext.load_model(filename)
  File "/usr/local/lib/python3.9/dist-packages/fasttext/FastText.py", line 441, in load_model
    return _FastText(model_path=path)
  File "/usr/local/lib/python3.9/dist-packages/fasttext/FastText.py", line 98, in __init__
    self.f.loadModel(model_path)
MemoryError: std::bad_alloc

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/./mycube.py", line 8, in <module>
    cube.load("en")
  File "/usr/local/lib/python3.9/dist-packages/cube/api.py", line 208, in load
    self._instance = _load(lang, device)
  File "/usr/local/lib/python3.9/dist-packages/cube/api.py", line 194, in _load
    raise Exception("There was a problem retrieving this language. Either it is unsupported or your Internet "
Exception: There was a problem retrieving this language. Either it is unsupported or your Internet connection is down.

To check for supported languages, visit https://github.com/adobe/NLP-Cube/

It is hard to maintain models for all UD Treebanks. This is way we are only including a handful oflanguages with the official distribution. However, we can include additional languages upon request

To make a request for supporting a new language please create an issue on GitHub

I have checked and the model is in place. The version of fasttext is 0.9.2.

I would appreciate some help to get over this error. Thank you in advance.

tiberiu44 commented 10 months ago

Hey @sorinsfirlogea ,

Thank you for bringing this to our attention. I just checked everything end-to-end and I don't see any issues with the servers. Can you try deleting the english fasttext model under ~/.fasttext. There might have been an issue with the download. Also, can you check if you have sufficient system memory. The fasttext object uses about 6GB of RAM and NLP-Cube, uses 4-6 more GBs. Let me know if this helps.

There is however a small issue with the script itself. We need to push a change into cube, because the Document object is currently not iterable. You can use this simplified version instead:

from cube.api import Cube
cube=Cube(verbose=True)
cube.load("en")
text="All the faith he had had, had had no effect on the outcome of his life."
sentences=cube(text)
print(sentences)
sorinsfirlogea commented 10 months ago

It may be a memory problem, then. I don't have this amount of RAM on my box and this might be the explanation. Maybe you would consider adding a warning notice on NLPCube manifest about the harware requirements, it would be helpful for those curious to try it. Thanks for the quick reply.

tiberiu44 commented 10 months ago

Yes, you are right about this. You can give it a shot in a Google Colab. It should have enough RAM for this.