Open chuckbelisle opened 1 year ago
Tracing the Base Images: | Base Image | Source | Extras |
---|---|---|---|
jupyter/datascience-notebook:ed2908bbb62e | https://github.com/jupyter/docker-stacks/blob/main/images/datascience-notebook/Dockerfile | - Repo structure has recently been refactored - Contains a chain of images, should be traced - https://github.com/jupyter/docker-stacks/wiki/aarch64-base-notebook-ed2908bbb62e - Image size: 849MB | |
rocker/geospatial:4.2.1@sha256:5caca36b8962233f8636540b7c349d3f493f09e864b6e278cb46946ccf60d4d2 | https://github.com/rocker-org/rocker-versioned2/blob/master/dockerfiles/geospatial_4.2.1.Dockerfile | Contains a chain of images, should be traced | |
k8scc01covidacr.azurecr.io/sas4c:0.0.3 | https://github.com/govcloud/docker-sas4c/blob/master/Dockerfile | CentOS is discontinued, should be refactored. |
Looks like the two best sources of notebook images that we can start from are:
Juypter: https://github.com/jupyter/docker-stacks/tree/main/images
Kubeflow: https://github.com/kubeflow/kubeflow/tree/master/components/example-notebook-servers
These are fairly light weight but do split themselves into different specialties... use a good matrix / inheritance organization system... perhaps we should compare and contrast these options (including our current images).
Also this will be useful to review of course: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
We can store build artifacts (i.e trivy scans) for 90 days and consume those files into issues. https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow
Let's go over our current images and go back to the drawing board.
We need the following:
Note: These images should stay as barebones as possible and have users create their own virtual environment in order to avoid package conflicts.