PacktPublishing / Graph-Machine-Learning

Graph Machine Learning, published by Packt
MIT License
261 stars 140 forks source link

NLTK corpora reuters is not loaded even after download #3

Closed amir1m closed 3 years ago

amir1m commented 3 years ago

Hello, I am trying to execute examples from Graph-Machine-Learning/Chapter07/01_nlp_graph_creation.ipynb in Google Colab.

At line number#5 corpus = pd.DataFrame([..]) I am getting error as :

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/nltk/corpus/util.py in __load(self)
     79             except LookupError as e:
---> 80                 try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
     81                 except LookupError: raise e

5 frames
LookupError: 
**********************************************************************
  Resource reuters not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('reuters')

  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************

During handling of the above exception, another exception occurred:

LookupError                               Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths)
    671     sep = '*' * 70
    672     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 673     raise LookupError(resource_not_found)
    674 
    675 

LookupError: 
**********************************************************************
  Resource reuters not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('reuters')

  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************

Even after following instructions to nltk.download('reuters') I am still getting the same error. Reuters is download in /root/

~/nltk_data/corpora# ls reuters.zip

Could you please help me?

Thanks,

deusebio commented 3 years ago

Hi @amir1m ,

I have been looking into this and I believe that after downloding you should also unzip the archive content. Can you try to unzip the archive by adding a cell with this command

!unzip /root/nltk_data/corpora/reuters.zip -d /root/nltk_data/corpora/.

and re-run the command above at line #5?