boudinfl / pke

Python Keyphrase Extraction module
GNU General Public License v3.0
1.57k stars 291 forks source link

KeyError: 'hinglish' #200

Closed upasana-mittal closed 2 years ago

upasana-mittal commented 2 years ago

I am getting this error while importing pke

get_alpha_2 = lambda l: LANGUAGE_CODE_BY_NAME[l] KeyError: 'hinglish'

     File "/app/model/src/analysis/AnalysisService.py", line 6, in <module>
  from pke.unsupervised import TextRank, TopicRank, SingleRank
File "/usr/local/lib/python3.7/site-packages/pke/__init__.py", line 5, in <module>
  from pke.base import LoadFile
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 31, in <module>
  lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 31, in <dictcomp>
  lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 29, in <lambda>
  get_alpha_2 = lambda l: LANGUAGE_CODE_BY_NAME[l]
KeyError: 'hinglish'`
atabas commented 2 years ago

I'm getting the same error...does anyone know what's wrong?

ajithb073 commented 2 years ago

Reason for KeyError: Pke library requires nltk library for the language codes. In pke's "langcodes.py" there is absence of language code for 'hinglish'.

Solution: In the home location, the "nltk_data" folder will be present. Inside nltk_data/corpora/stopwords there will be file named as 'hinglish'. Just remove that file from that folder and your error will be taken care of.

aradhana298 commented 2 years ago

where to get "nltk_data" folder in colab?

hammadmukhtar21 commented 2 years ago

where to get "nltk_data" folder in colab?

Check the path where nltk is downloading. Normally it is stored in the /root/ directory. You can access the root directory on the left side of the colab pane by clicking on "..." which means more options. It is visible beside the sample.

nltk nltk2

talhaanwarch commented 2 years ago

you can simply do !rm /root/nltk_data/corpora/stopwords/hinglish

btw removing did not worked for me

btw i did not face the issue with latest version

upasana-mittal commented 2 years ago

I had issue because I will installing on commit hash but since I switched to full git, it is working fine. no more error

pip install git+https://github.com/boudinfl/pke.git
ygorg commented 2 years ago

As said earlier in the thread, please update to the latest version. If you are using pke with an unsupported language please provide custom stopwords using stoplist argument as such:

shadok_stoplist = ['ga', 'zo']
preprocessed_document = [  # Obtained via custom pos tagging tool or manual annotation
    [('ga', 'DET'), ('bu', 'NOUN'), ('zo', 'AUX'), ('meu', 'ADJ'), ('.', 'PUNCT')]
]
e = pke.unsupervised.MultipartiteRank()
e.load_document(
    preprocessed_document, language='shadok',
    stoplist=shadok_stoplist, normalization=None)