datasciencecampus / pygrams

Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence
https://datasciencecampus.github.io/pygrams
Other
63 stars 23 forks source link

Dockerfile for project #357

Closed salmaniqbal closed 4 years ago

salmaniqbal commented 4 years ago

If you would like your project to run using docker, please take a look at this PR. I have tested some scenarios and it seems to be working. I have not tested all of the functionality when running in docker. The PR does not impact any current functionality.

This PR adds:

//TODO: Currently, you have the build the docker image with the data file already in the '/data' folder and then run it. Instructions need to be added to run docker image with a mounted volume. This way users don't have to build the image with the data file in it and can add it at run time. This reduces build time and the size of the image produced.

thanasions commented 4 years ago

Thanks for the PR Salman. We will review shortly. Best wishes, T.

codecov[bot] commented 4 years ago

Codecov Report

Merging #357 into develop will not change coverage. The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop     #357   +/-   ##
========================================
  Coverage    54.75%   54.75%           
========================================
  Files           42       42           
  Lines         3514     3514           
========================================
  Hits          1924     1924           
  Misses        1590     1590
salmaniqbal commented 4 years ago

@thanasions it probably does not do everything you need but it is a starting point.

salmaniqbal commented 4 years ago

@IanGrimstead, good point. Let me modify the docker image, test and send you another pr.

salmaniqbal commented 4 years ago

@IanGrimstead, reqtirements.txt file is removed, dockerfile updated. Seems to work okay. The image currently is 4.59 GB. When I get a few moments, I'll add a dockerignore file to not copy contents of data & output folder in the image. As you know, we can do this by mounting a drive. This will also give a way for users to get the output files from inside the container.

mshodge commented 4 years ago

Thanks @salmaniqbal all seems to work great on my machine. I've also created a docker.io image and uploaded to the datasciencecampus account. Could you add the following to the installation README:


### Setup using Docker

Ensure that [docker](https://docs.docker.com/v17.09/engine/installation/) is installed on your machine.

#### To build your own Docker image

Navigate to root directory of the project and build the docker image

`docker build -t pygrams`

`-t` - tags the image 

`pygrams` - image tag 

#### To use a pre-built Docker image

The latest version of pyGrams has been added to [docker.io](docker.io) at [https://hub.docker.com/r/datasciencecampus/pygrams](https://hub.docker.com/r/datasciencecampus/pygrams). To use this:

`docker pull datasciencecampus/pygrams`

#### To run the Docker image

Run the built or pulled docker image using

`docker run pygrams`
salmaniqbal commented 4 years ago

Nice work @mshodge, updated the instructions

mshodge commented 4 years ago

Thanks @salmaniqbal all merged!