RMI-PACTA / 2dii-DataWarehouse

1 stars 1 forks source link

Change Dockerfile to base on Alpine #10

Open AlexAxthelm opened 4 years ago

AlexAxthelm commented 4 years ago

We should change the Dockerfile to base on Alpine linux (using python:3.8.0-alpine, rather than Debian (python:3.8.0-buster) because it is smaller, and therefore faster to download on CI/CD machines. More importantly the Alpine image secure (python:3.8.0-alpine image has no known vulnerabilities, according to Snyk, compared with many in the debian base image).

Not only will making this change improve performance and security, it will also allow us to more easily extend the base image using multi-stage builds. An example of how this is useful is to add the tools for running pgTAP to the app image as an extension, allowing us to initialize the database and run the tests all in one container, rather than the current pattern of initializing the database in one container, and then starting up another to run the tests.

The major challenge is that some of the pip packages we use (notably pandas) have dependencies that will require building from source. To avoid the long compile times on CI/CD machines, the suggested course of action is to publish a docker image to DockerHub with our dependencies compiled (either through a GitHub Action, using DockerHub Automatic Builds, or by publishing from a developer's local machine). Then we can reconfigure the docker-compose files to use 2dii/dw_app:latest, or some similar construction.

evan-2deg commented 4 years ago

Would any of these links/solutions help/work?

--Docker Build https://hub.docker.com/r/nickgryg/alpine-pandas

--Dockerfile with pandas installed already https://github.com/amancevice/docker-pandas

--Docker for Alpine 3.7 https://stackoverflow.com/a/54934050

Alternatively, can we just go through the lengthy process of installing pandas once as listed here and committing it for our own purposes going forward?

--Note that it is available as an apk on the edge branch under testing for alpine: https://pkgs.alpinelinux.org/packages?name=*pandas&branch=edge