Open TaylorN15 opened 1 month ago
This sounds like an IT problem unrelated to the framework, if you're behind a firewall how do you expect to download any NLTK packages?
I would recommend you Dockerize and cache the dependencies, building your container somewhere with internet access.
ENV NLTK_DATA=/usr/share/nltk_data
RUN mkdir -p $NLTK_DATA && chmod -R 777 $NLTK_DATA
RUN python -m nltk.downloader -d $NLTK_DATA stopwords punkt averaged_perceptron_tagger
We already have an exception for the NLTK packages as they are downloaded from GitHub, and this exception was already in place to allow certain Python packages and Oryx builds to work.
I'm just saying that someone else may encounter this same issue, as most IT departments won't allow access to an unknown public S3 bucket.
Describe the bug Since the change was made to no longer use nltk.download() my application cannot download the required NLTK packages. The application is behind a firewall and we are only allowed to except specific traffic, and a public S3 bucket is proving difficult to get approved.
I get an error when it attempts to download the packages:
<urlopen error [Errno 104] Connection reset by peer>
To Reproduce Use a partitioner that requires NLTK
Expected behavior NLTK package download doesn't fail
Additional context Perhaps there is a way to include the required NLTK packages or pre-download them before the application is zipped and deployed?