aedocw / epub2tts-edge

epub2tts-edge uses Microsoft Edge cloud-based TTS to create a full featured audiobook m4b from an epub or text file
GNU General Public License v3.0
95 stars 14 forks source link

Having Issues with a book in spanish #24

Closed siba2893 closed 4 months ago

siba2893 commented 5 months ago
image
aedocw commented 5 months ago

looks like for some reason nltk did not download punkt (it's supposed to do that automatically on the first run I think).

Try launching python (with python) and running:

import nltk
nltk.download("punkt")

If that runs without error, type exit() to get out of python, and try running epub2tts-edge again.

aedocw commented 5 months ago

Have you had a chance to try this again? I suspect it's not a bug as I have not been able to reproduce in a clean environment, but if you are able to reproduce this reliably please let me know.

erfansamandarian commented 4 months ago

@aedocw, would it be bad form to just do a import nltk; nltk.download("punkt") somewhere in setup.py?

aedocw commented 4 months ago

I'm reluctant to add code to address an issue that is likely not easily reproduced. It could have been a temporary networking problem on the users end, or something else weird with their environment. If this is a common thing that multiple people are hitting it would be worth trying to find a reliable way to trigger the download on it's own, but generally NLTK is supposed to handle it.

erfansamandarian commented 4 months ago

For what it's worth, I've gotten the same message every time I've tried to install on a system without the nltk folder in my home folder. This didn't happen in epub2tts as far as I remember.

aedocw commented 4 months ago

Ah ok, that is interesting! Makes me think I should spend a little more time trying to recreate that in clean environments then.

prydom commented 4 months ago

I setup in a fresh distrobox and Python virtual environment yesterday since I don't use an Debian-based distro and encountered the same error.

Note that running the code directed in the error message created a directory tree in ~/nltk_data without asking where I wanted it saved. If that could be configured save the token models in the Python lib (meaning it would end up in the virtual environment tree) directory instead, that would be helpful.

~ $ tree nltk_data
nltk_data
└── tokenizers
    ├── punkt
    │   ├── PY3
    │   │   ├── README
    │   │   ├── czech.pickle
[...]
aedocw commented 4 months ago

I'll add this which should handle the fetching:

def ensure_punkt(self):
    try:
        nltk.data.find("tokenizers/punkt")
    except LookupError:
        nltk.download("punkt")

As for storing that data elsewhere, I'll add some instructions to README indicating how to use export NLTK_DATA="your/path/to/nltk_data" to specify where the NLTK data lives.

aedocw commented 4 months ago

This was resolved with https://github.com/aedocw/epub2tts-edge/pull/31