Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment

leothomas commented 5 months ago

In order to make it easier to run the model/notebooks without having to manage installing dependencies across various machines/environments, I've added a micromamba based Dockerfile which will create the conda environment with the specified libraries.

I've also added a docker compose file, in order to specify the build-time and run-time arguments for exposing the jupyter lab port and mounting the current directly as a volume, into the docker container. This will allow users to modify any of the model/notebook code locally, without having to re-build the image.

By default the docker image starts with running jupyter lab but this can be overridden both in the docker-compose or even in the command line with any other python or bash command.

The platform=linux/amd64 build and run-time args enable the image to be built on Mac M1 while maintaining compatibility with Linux.

The container can be run with: docker-compose up or docker-compose run claymodel <command> where command is a command which override the jupyter lab startup

The container can also be built directly (bypassing the need for docker-compose) with:

docker build . -t clay --platform linux/amd64

and then run with:

docker run --rm -it -v $(pwd):/model -p 8888:8888 -e ENV_NAME=claymodel --platform linux/amd64 clay:latest

chuckwondo commented 4 months ago

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

Absolutely add a .dockerignore file to this PR. You can probably start with much (or all) of what's in .gitignore.

However, keep in mind that there are cases where you must be more explicit in .dockerignore.

For example, you cannot use only __pycache__/ in .dockerignore because that will ignore only a top level __pycache__/ directory. To ignore such a directory at all levels, you must use **/__pycache__/ in .dockerignore. Another good candidate for this is to add **/.ipynb_checkpoints/ since there are notebooks in the docs directory.

chuckwondo commented 4 months ago

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

Absolutely add a .dockerignore file to this PR. You can probably start with much (or all) of what's in .gitignore.

However, keep in mind that there are cases where you must be more explicit in .dockerignore.

For example, you cannot use only __pycache__/ in .dockerignore because that will ignore only a top level __pycache__/ directory. To ignore such a directory at all levels, you must use **/__pycache__/ in .dockerignore. Another good candidate for this is to add **/.ipynb_checkpoints/ since there are notebooks in the docs directory.

Alternatively, as I mentioned to you ages ago, I tend to "invert" my use of .dockerignore to make it act more like an "allow" list rather than a "deny" list, which I find to be safer and clearer. For example, consider making a .dockerignore that ignores everything and then does not ignore the things you want to "allow":

*
!environment.yml
!conda-lock.yml
!**/*.py
!**/*.ipynb
!**/*.sh

leothomas commented 4 months ago

Awesome! Thank you both! I've addressed some of the request changes:

Regenerating the conda-lock.yml
Installing the micromamba environment from conda-lock.yml rather than environment.yml
Removing the pyarrow dependency
Adding a .dockerignore which ignores everything by default and only allows the files specifically needed
Updating the README with relevant info on the docker image/container and how to run it
Updating the jupyter lab flags to avoid the unnecessary usage of --allow-root and address some warning/debugging logs

Do y'all think it would be valuable to try to see if I can get this to run with a cuda docker image to enable the pytorch-cuda installation?

yellowcap commented 1 month ago

Putting some life into this to see if we can get this to work for binder and friends.

yellowcap commented 1 month ago

Got this to work locally. Docker-compose and Jupyter notebook working fine with the image built from the Dockerfile. 🌮

@leothomas have a look and let me know if you think the small changes make sense to you. Happy to merge this in after some review. Not sure how to use the functionality in things like binder but would be nice to use the dockerized version for those if that unbreaks the deployments!

yellowcap commented 1 month ago

@chuckwondo re-requested your review to unblock this. I think with v1 it's now working ok. If you have some time try it out 🐋

yeelauren commented 1 month ago

Hey just wanted to flag - I noticed some of the paths for the notebook are broken in docker as well:

I think it should be two directories up or explicit instructions on changing your workspace directory paths.

Clay-foundation / model

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166