Closed siba2893 closed 4 months ago
looks like for some reason nltk did not download punkt (it's supposed to do that automatically on the first run I think).
Try launching python (with python
) and running:
import nltk
nltk.download("punkt")
If that runs without error, type exit()
to get out of python, and try running epub2tts-edge again.
Have you had a chance to try this again? I suspect it's not a bug as I have not been able to reproduce in a clean environment, but if you are able to reproduce this reliably please let me know.
@aedocw, would it be bad form to just do a import nltk; nltk.download("punkt")
somewhere in setup.py?
I'm reluctant to add code to address an issue that is likely not easily reproduced. It could have been a temporary networking problem on the users end, or something else weird with their environment. If this is a common thing that multiple people are hitting it would be worth trying to find a reliable way to trigger the download on it's own, but generally NLTK is supposed to handle it.
For what it's worth, I've gotten the same message every time I've tried to install on a system without the nltk folder in my home folder. This didn't happen in epub2tts as far as I remember.
Ah ok, that is interesting! Makes me think I should spend a little more time trying to recreate that in clean environments then.
I setup in a fresh distrobox and Python virtual environment yesterday since I don't use an Debian-based distro and encountered the same error.
Note that running the code directed in the error message created a directory tree in ~/nltk_data
without asking where I wanted it saved. If that could be configured save the token models in the Python lib (meaning it would end up in the virtual environment tree) directory instead, that would be helpful.
~ $ tree nltk_data
nltk_data
└── tokenizers
├── punkt
│ ├── PY3
│ │ ├── README
│ │ ├── czech.pickle
[...]
I'll add this which should handle the fetching:
def ensure_punkt(self):
try:
nltk.data.find("tokenizers/punkt")
except LookupError:
nltk.download("punkt")
As for storing that data elsewhere, I'll add some instructions to README indicating how to use export NLTK_DATA="your/path/to/nltk_data"
to specify where the NLTK data lives.
This was resolved with https://github.com/aedocw/epub2tts-edge/pull/31