Self-Contained: Your service or job should be as self-contained as possible. This means that we want to make sure as much as possible in included in image builds i.e. Dockerfile. I noticed that you did not include your source files into the image via a COPY command.
Runtime Config: Any runtime specific config that does not have an appropriate default, should go into a docker-compose.yml with appropriate instructions to allow others or CI/CD to leverage it.
Image Footprint: Typically for PROD, we try as mush as feasible to with alpine-based images due to their low resource footprint. This service is simple enough that it is a good candidate for this. For instance, ubuntu:20.04 is at a base of 27.27MB (before any additional layers are included) where as python:3.8-alpine is at a base of 15.88MB (with python and pip preinstalled). That is more than a 42% memory reduction.
Dev dependencies: Since it seems like you are interested in developing with Jupyter ipynb's, I would recommend using build args to alter your base at runtime i.e use our datajoint/djlab image for dev and the python:3.8-alpine for PROD. That way we have some control on what is available in the image depending on the need. However, sometimes it might make more sense to separate it into multiple Dockerfiles.
Dependencies: Generally, you should describe some justification in comments in your Dockerfile for needing certain dependencies. This is good practice both for you to remember why it was needed but also for KT to others who might work on this. That said, git doesn't seem to be needed as a dependency if you simply COPY the source into the build. Also, can you share why google-auth-httplib2, google-auth-oauthlib are needed? Those is work w/o both or either? I did a quick test on imports and it passes though I suspect it is related to actually invoking an OAuth flow. Best to include the least number of deps as possible.
Default ENTRYPOINT/COMMAND: As much as it makes sense, you should try to include the expected command and/or an entrypoint so that it is clear how the service/job is being initiated in the image.
This is handle by K8 where it is download at run time from the main branch, though I need to look up the specific command to clone a very specific version. This basically gets over the problem that for DockerHub we are limited to only 1 private repo.
I don't use docker compose, as development with K8 is the main, I can create one later.
Hmm, good point, I got lazy and just went with Ubuntu:20.04 bt like you said it is a bit overkill and wasteful in terms of memory usage. I will make a pull request later today to change that.
I only jupyter as an option to quickly prototype code and debug the main code base. The final deployment and juptyer both uses the same docker image included in this repo. With jupyter one installing jupyter at run time if you need it.
I use K8 to clone the repo at run time instead of including it in the dockerfile image which is public. For those two dependencies, those are stuff I just have installed since Google Recommended it. Chances are it is not needed like you mention. I test it to confrim it on my side too then remove it. Google docs aren't exactly the best...
This is also handle by K8, basically K8 replace docker compose and the docker image just becomes a depenedencies image.
Dockerfile
. I noticed that you did not include your source files into the image via aCOPY
command.docker-compose.yml
with appropriate instructions to allow others or CI/CD to leverage it.alpine
-based images due to their low resource footprint. This service is simple enough that it is a good candidate for this. For instance,ubuntu:20.04
is at a base of 27.27MB (before any additional layers are included) where aspython:3.8-alpine
is at a base of 15.88MB (with python and pip preinstalled). That is more than a 42% memory reduction.datajoint/djlab
image for dev and thepython:3.8-alpine
for PROD. That way we have some control on what is available in the image depending on the need. However, sometimes it might make more sense to separate it into multipleDockerfiles
.Dockerfile
for needing certain dependencies. This is good practice both for you to remember why it was needed but also for KT to others who might work on this. That said,git
doesn't seem to be needed as a dependency if you simplyCOPY
the source into the build. Also, can you share whygoogle-auth-httplib2
,google-auth-oauthlib
are needed? Those is work w/o both or either? I did a quick test on imports and it passes though I suspect it is related to actually invoking an OAuth flow. Best to include the least number of deps as possible.ENTRYPOINT
/COMMAND
: As much as it makes sense, you should try to include the expected command and/or an entrypoint so that it is clear how the service/job is being initiated in the image.