PsychoinformaticsLab / pliers

Automated feature extraction in Python
https://pliers.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
297 stars 68 forks source link

wishlist: an official docker image of pliers #345

Open yarikoptic opened 5 years ago

yarikoptic commented 5 years ago

There is a Dockerfile so it should be easy to build/provide an image for pliers from docker hub. I think it is just a matter of flipping a switch. Ideally there should then be tagged versions which correspond to releases (when the next release comes) ATM there is already https://hub.docker.com/r/kaczmarj/pliers/tags since I guess @kaczmarj just builds for all repos he has and he has a clone of pliers ;-)

yarikoptic commented 5 years ago

may be no official docker image because of some restrictive licenses of installed/bundled/downloaded components e.g. from

Step 9/11 : RUN python -m pliers.support.download
 ---> Running in 850656a5a419
[nltk_data] Downloading package punkt to /home/pliers/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package maxent_treebank_pos_tagger to
[nltk_data]     /home/pliers/nltk_data...
[nltk_data]   Unzipping taggers/maxent_treebank_pos_tagger.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/pliers/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/pliers/nltk_data...

step?

kaczmarj commented 5 years ago

ha - i guess i do tend to build docker images for the projects i fork..

i don't know which licenses these data use, but the data are here. maybe we can also forego this download step, if it would prevent us from making an official docker image.

kaczmarj commented 5 years ago

oops, i meant to include the link to the data http://www.nltk.org/nltk_data/

we might also consider not downloading the data in the docker image, if the licenses are restrictive. the error message related to not having the data gives the command to get the data.