aedocw / epub2tts-edge

epub2tts-edge uses Microsoft Edge cloud-based TTS to create a full featured audiobook m4b from an epub or text file
GNU General Public License v3.0
95 stars 14 forks source link

LookupError NLTK Resource punkt_tab not found #35

Closed nateProjects closed 1 month ago

nateProjects commented 1 month ago

Hiya! I'm getting a 'LookupError' - Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:

I have imported it and it is in /Users/nat/nltk_data and it is looking there, but still not finding it.

I'm on MacOS 15.0.1 M1 Arm. Thanks!

nat@Nathans-MacBook-Air epub2tts-edge % epub2tts-edge 9781849354738_epub.txt --cover 9781849354738_epub.png
Namespace(sourcefile='9781849354738_epub.txt', speaker='en-US-AndrewNeural', cover='9781849354738_epub.png')
Traceback (most recent call last):
  File "/Users/nat/.pyenv/versions/3.12.4/bin/epub2tts-edge", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/nat/.pyenv/versions/3.12.4/lib/python3.12/site-packages/epub2tts_edge/epub2tts_edge.py", line 421, in main
    book_contents, book_title, book_author, chapter_titles = get_book(args.sourcefile)
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nat/.pyenv/versions/3.12.4/lib/python3.12/site-packages/epub2tts_edge/epub2tts_edge.py", line 189, in get_book
    sentences = sent_tokenize(line)
                ^^^^^^^^^^^^^^^^^^^
  File "/Users/nat/.pyenv/versions/3.12.4/lib/python3.12/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
    tokenizer = _get_punkt_tokenizer(language)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nat/.pyenv/versions/3.12.4/lib/python3.12/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
    return PunktTokenizer(language)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nat/.pyenv/versions/3.12.4/lib/python3.12/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
    self.load_lang(lang)
  File "/Users/nat/.pyenv/versions/3.12.4/lib/python3.12/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
    lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nat/.pyenv/versions/3.12.4/lib/python3.12/site-packages/nltk/data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/Users/nat/nltk_data'
aedocw commented 1 month ago

About to push up a fix for this :)

aedocw commented 1 month ago

NOT FIXED :(

I was able to re-produce, merged and closed too quickly...

aedocw commented 1 month ago

I ran pip install --upgrade nltk and it's fixed. Unfortunately I think you'll have to do that, it can't be done from inside epub2tts-edge.

Please try that and let me know if it resolves the issue.

nateProjects commented 1 month ago

That worked perfectly! Thanks so much! I'll recommend this to any I think could use it - I certainly will! Cheers! Nate

On Thu, 10 Oct 2024 at 18:53, Christopher Aedo @.***> wrote:

I ran pip install --upgrade nltk and it's fixed. Unfortunately I think you'll have to do that, it can't be done from inside epub2tts-edge.

Please try that and let me know if it resolves the issue.

— Reply to this email directly, view it on GitHub https://github.com/aedocw/epub2tts-edge/issues/35#issuecomment-2405717918, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEB64JV4U7LBKSW6KHBNNLZ225JHAVCNFSM6AAAAABPXIDQP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBVG4YTOOJRHA . You are receiving this because you authored the thread.Message ID: @.***>

aedocw commented 1 month ago

Excellent, glad that sorted it, and glad you're enjoying the script!