Closed Fluder-Paradyne closed 1 year ago
yes, you can set the env var NLTK_DATA in your Dockerfile to the directory location to download the NLTK data to, then do something like:
https://github.com/Unstructured-IO/unstructured/blob/331c7fa/Dockerfile#L45-L46
Got it thanks
I found that every time I use it, I will try to download it. How to make him use it instead of downloading it?
if you are using docker add this line to your docker file
RUN python3 -c "import nltk; nltk.download('punkt')" && \
python3 -c "import nltk; nltk.download('averaged_perceptron_tagger')"
or just run in your terminal, this will download to a NLTK folder in your local machine which should be re-used,
if it is not happening then add an env NLTK_DATA
with the downloaded folder path as its value
python3 -c "import nltk; nltk.download('punkt')" && \
python3 -c "import nltk; nltk.download('averaged_perceptron_tagger')"
if you are using docker add this line to your docker file
RUN python3 -c "import nltk; nltk.download('punkt')" && \ python3 -c "import nltk; nltk.download('averaged_perceptron_tagger')"
or just run in your terminal, this will download to a NLTK folder in your local machine which should be re-used, if it is not happening then add an env
NLTK_DATA
with the downloaded folder path as its valuepython3 -c "import nltk; nltk.download('punkt')" && \ python3 -c "import nltk; nltk.download('averaged_perceptron_tagger')"
Thank you,add an env NLTK_DATA is effective.
Describe the bug When internet is slow I am getting this error, I want my application to run offline
To Reproduce
Expected behavior should be able to read the file
Screenshots If applicable, add screenshots to help explain your problem.
Environment Info debian 11 ( docker python:3.10-slim-bullseye )
Additional context If you can point out what to download and where to place the files in. I think I can make it happen in the docker build step itself so that the download doesnt have to start everytime
thanks