Closed QilongChan closed 5 years ago
If we add all models
nltk.download('all')
against only punkt
nltk.download('punkt')
used in the example in #3370 , the docker image is 25% bigger (12Gb vs 9Gb), so we should evaluate the need of adding each model.
@AlexCatarino will punkt
alone serve +90% of user needs? If not; what combination of dependencies will achieve 90%+? Otherwise, we'll be back here in 2months adding another dependency =)
While here should also add "OpenNLP" for C# algorithms.
For the data I've downloaded locally, it is the corpora (text data set) that takes 80% of the space (2.8G over 3.2G). But for building the models, since the data there is mostly unrelated to finance, I think only punkt
would be enough if someone really wants to use this package to build an algorithm.
Expected Behavior
API methods of NLTK need dependencies (listed below). This can be done by commands:
The details of dependencies:
Actual Behavior
Potential Solution
Reproducing the Problem
System Information
Checklist
master
branch