Helsinki-NLP / Opus-MT

Open neural machine translation models and web services
MIT License
592 stars 71 forks source link

Minimise the size of the Docker image #38

Closed ianroberts closed 3 years ago

ianroberts commented 3 years ago

The current Dockerfile in Opus-MT leaves a large amount of un-necessary data in the image, such as the full Marian source tree, multiple copies of all the Marian binaries, as well as a large number of apt packages that are required in order to build the tools but not to run them. This makes for an extremely large final image (in the region of 10GB not including any models). While this isn't too much of a concern if you're just building the image locally as part of a docker-compose, if you want to push the image to a registry for use by others (or to run under Kubernetes) then it pays to keep the image as small as possible.

This PR uses a multi-stage Docker build to minimise the final image size. The build container is still large - that's unavoidable - but for the final container the Dockerfile now starts over from a "slim" Python base image and copies across only the required marian-server binary, Python packages and a few other dependencies that are actually required at runtime. This brings the final image size down to under 600MB (again, not including any models).