Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
8.78k stars 718 forks source link

Facing punkt error for PY3_tab #3597

Open Asif-droid opened 1 month ago

Asif-droid commented 1 month ago

After installing all the packages required. Its showing this File not Found error. This is my code: from unstructured.partition.auto import partition

elements = partition(filename="/content/drive/MyDrive/unstructured/Copy of PDF-1_Page-8.jpg") print("\n\n".join([str(el) for el in elements]))

Error:

OSError: No such file or directory: '/root/nltk_data/tokenizers/punkt/PY3_tab'

I'm using Google colab. This is what I found inside punkt: ls /root/nltk_data/tokenizers/punkt/

showing this: czech.pickle finnish.pickle malayalam.pickle README turkish.pickle danish.pickle french.pickle norwegian.pickle russian.pickle dutch.pickle german.pickle polish.pickle slovene.pickle english.pickle greek.pickle portuguese.pickle spanish.pickle estonian.pickle italian.pickle PY3 swedish.pickle

How to solve this ??

rrcgat commented 3 days ago

https://github.com/nltk/nltk/issues/3305

Try upgrade nltk to 3.9.1