CAMeL-Lab / camel_tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
MIT License
409 stars 72 forks source link

[QUESTION] There seems to be a problem with the .pretrained() function #31

Closed JumanaMSA closed 3 years ago

JumanaMSA commented 3 years ago

I keep getting the following error:

Traceback (most recent call last): File "/Users/Jumana/Desktop/s.py", line 2, in d=DialectIdentifier.pretrained() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/camel_tools/dialectid/init.py", line 616, in pretrained 'No pretrained model for current Python version found.') camel_tools.dialectid.PretrainedModelError: No pretrained model for current Python version found.

I followed what's in these links:

https://github.com/CAMeL-Lab/camel_tools#install-using-pip https://github.com/CAMeL-Lab/camel_tools#installing-data

My Code:

from camel_tools.dialectid import DialectIdentifier d=DialectIdentifier.pretrained()

OS: macOS Python: version 3.7.6 CAMeL Tools version: 1.0.1 CAMeL Tools installation source: pip

Thanks.

owo commented 3 years ago

Hi @JumanaMSA ,

Can you please run ls -R ~/.camel_tools on your terminal (or ls -R $CAMELTOOLS_DATA if you installed the data in a custom directory) and share the output?

Can you also run the following in Python code and share its output?

import sys
print('{}{}'.format(sys.version_info.major, sys.version_info.minor))

Thanks.

JumanaMSA commented 3 years ago

The output of ls -R ~/.camel_tools:

Jumana@jumana desktop % ls -R ~/.camel_tools data

The output of print('{}{}'.format(sys.version_info.major, sys.version_info.minor)):

37

owo commented 3 years ago

Hi @JumanaMSA ,

Can you please rerun the terminal command and share all the output?

The output should look like the sample below. If it doesn't then you haven't installed the data correctly or you may have installed the light and not the full data set. Make sure you have the full data installed in order to use dialectid.

data

/Users/owo/.camel_tools/data:
dialectid          disambig_mle       morphology_db      ner                sentiment_analysis

/Users/owo/.camel_tools/data/dialectid:
default

/Users/owo/.camel_tools/data/dialectid/default:
did_pretrained_36.dill did_pretrained_37.dill did_pretrained_38.dill lm

/Users/owo/.camel_tools/data/dialectid/default/lm:
char word

/Users/owo/.camel_tools/data/dialectid/default/lm/char:
ALE.arpa ALX.arpa ASW.arpa BAS.arpa BEN.arpa DAM.arpa FES.arpa JER.arpa MOS.arpa MUS.arpa RIY.arpa SAN.arpa TRI.arpa
ALG.arpa AMM.arpa BAG.arpa BEI.arpa CAI.arpa DOH.arpa JED.arpa KHA.arpa MSA.arpa RAB.arpa SAL.arpa SFX.arpa TUN.arpa

/Users/owo/.camel_tools/data/dialectid/default/lm/word:
ALE.arpa ALX.arpa ASW.arpa BAS.arpa BEN.arpa DAM.arpa FES.arpa JER.arpa MOS.arpa MUS.arpa RIY.arpa SAN.arpa TRI.arpa
ALG.arpa AMM.arpa BAG.arpa BEI.arpa CAI.arpa DOH.arpa JED.arpa KHA.arpa MSA.arpa RAB.arpa SAL.arpa SFX.arpa TUN.arpa

/Users/owo/.camel_tools/data/disambig_mle:
calima-egy-r13 calima-msa-r13

/Users/owo/.camel_tools/data/disambig_mle/calima-egy-r13:
LICENSE    model.json

/Users/owo/.camel_tools/data/disambig_mle/calima-msa-r13:
LICENSE    model.json

/Users/owo/.camel_tools/data/morphology_db:
calima-egy-r13 calima-msa-r13

/Users/owo/.camel_tools/data/morphology_db/calima-egy-r13:
LICENSE       morphology.db

/Users/owo/.camel_tools/data/morphology_db/calima-msa-r13:
LICENSE       morphology.db

/Users/owo/.camel_tools/data/ner:
arabert

/Users/owo/.camel_tools/data/ner/arabert:
LICENSE                 pytorch_model.bin       tokenizer_config.json   vocab.txt
config.json             special_tokens_map.json training_args.bin

/Users/owo/.camel_tools/data/sentiment_analysis:
arabert mbert

/Users/owo/.camel_tools/data/sentiment_analysis/arabert:
LICENSE                 pytorch_model.bin       tokenizer_config.json   vocab.txt
config.json             special_tokens_map.json training_args.bin

/Users/owo/.camel_tools/data/sentiment_analysis/mbert:
LICENSE                 pytorch_model.bin       tokenizer_config.json   vocab.txt
config.json             special_tokens_map.json training_args.bin
JumanaMSA commented 3 years ago

Ah, I just noticed that I have been installing it using pip instead of pip 3 (I have both python 2 and 3 downloaded on the same device). It's all working now :D Thank you so much, and sorry for the inconvenience.

owo commented 3 years ago

No problem :)

WaelMohammedAbed commented 3 years ago

Hi, I am having the same problem. I tried to run the Colab notebook and got this error "PretrainedModelError: No pretrained model for current Python version found." after running this part

`from camel_tools.dialectid import DialectIdentifier

did = DialectIdentifier.pretrained()`

pip and python version: pip 21.1.3 from /usr/local/lib/python3.7/dist-packages/pip (python 3.7)

The output of running this command "" is `/gdrive/MyDrive/camel_tools: data/

/gdrive/MyDrive/camel_tools/data: dialectid/ disambig_mle/ morphology_db/ ner/ sentiment_analysis/

/gdrive/MyDrive/camel_tools/data/dialectid: default/

/gdrive/MyDrive/camel_tools/data/dialectid/default: did_pretrained_36.dill did_pretrained_38.dill lm/ did_pretrained_37.dill did_pretrained_39.dill

/gdrive/MyDrive/camel_tools/data/dialectid/default/lm: char/ word/

/gdrive/MyDrive/camel_tools/data/dialectid/default/lm/char: ALE.arpa ASW.arpa BEN.arpa FES.arpa MOS.arpa RIY.arpa TRI.arpa ALG.arpa BAG.arpa CAI.arpa JED.arpa MSA.arpa SAL.arpa TUN.arpa ALX.arpa BAS.arpa DAM.arpa JER.arpa MUS.arpa SAN.arpa AMM.arpa BEI.arpa DOH.arpa KHA.arpa RAB.arpa SFX.arpa

/gdrive/MyDrive/camel_tools/data/dialectid/default/lm/word: ALE.arpa ASW.arpa BEN.arpa FES.arpa MOS.arpa RIY.arpa TRI.arpa ALG.arpa BAG.arpa CAI.arpa JED.arpa MSA.arpa SAL.arpa TUN.arpa ALX.arpa BAS.arpa DAM.arpa JER.arpa MUS.arpa SAN.arpa AMM.arpa BEI.arpa DOH.arpa KHA.arpa RAB.arpa SFX.arpa

/gdrive/MyDrive/camel_tools/data/disambig_mle: calima-egy-r13/ calima-msa-r13/

/gdrive/MyDrive/camel_tools/data/disambig_mle/calima-egy-r13: LICENSE model.json

/gdrive/MyDrive/camel_tools/data/disambig_mle/calima-msa-r13: LICENSE model.json

/gdrive/MyDrive/camel_tools/data/morphology_db: calima-egy-r13/ calima-msa-r13/

/gdrive/MyDrive/camel_tools/data/morphology_db/calima-egy-r13: LICENSE morphology.db

/gdrive/MyDrive/camel_tools/data/morphology_db/calima-msa-r13: LICENSE morphology.db

/gdrive/MyDrive/camel_tools/data/ner: arabert/

/gdrive/MyDrive/camel_tools/data/ner/arabert: config.json pytorch_model.bin tokenizer_config.json vocab.txt LICENSE special_tokens_map.json training_args.bin

/gdrive/MyDrive/camel_tools/data/sentiment_analysis: arabert/ mbert/

/gdrive/MyDrive/camel_tools/data/sentiment_analysis/arabert: config.json pytorch_model.bin tokenizer_config.json vocab.txt LICENSE special_tokens_map.json training_args.bin

/gdrive/MyDrive/camel_tools/data/sentiment_analysis/mbert: config.json pytorch_model.bin tokenizer_config.json vocab.txt LICENSE special_tokens_map.json training_args.bin`

And the output of import sys print('{}{}'.format(sys.version_info.major, sys.version_info.minor)) is:

37

Currently I am only interested in the Dialect Identification feature as I want to classify Iraqi dialect from different tweets.

owo commented 3 years ago

@WaelMohammedAbed It seems you've installed the data correctly. You just need to run the following before using camel_tools (or this code block in the Guided Tour):

from google.colab import drive
import os

drive.mount('/gdrive')
os.environ['CAMELTOOLS_DATA'] = '/gdrive/MyDrive/camel_tools'

This points camel tools to the data located on the drive.