StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
69 stars 12 forks source link

[Epic] Re-design images #1822

Open chuckbelisle opened 1 year ago

chuckbelisle commented 1 year ago

Let's go over our current images and go back to the drawing board.

We need the following:

Note: These images should stay as barebones as possible and have users create their own virtual environment in order to avoid package conflicts.

Souheil-Yazji commented 1 year ago

Base Images

Looking at the output directory: Image Name Base Image
docker-stacks-datascience-notebook jupyter/datascience-notebook:ed2908bbb62e
jupyterlab-cpu jupyter/datascience-notebook:ed2908bbb62e
jupyterlab-pytorch jupyter/datascience-notebook:ed2908bbb62e
jupyterlab-tensorflow jupyter/datascience-notebook:ed2908bbb62e
remote-desktop rocker/geospatial:4.2.1@sha256:5caca36b8962233f8636540b7c349d3f493f09e864b6e278cb46946ccf60d4d2
rstudio jupyter/datascience-notebook:ed2908bbb62e
sas jupyter/datascience-notebook:ed2908bbb62e & k8scc01covidacr.azurecr.io/sas4c:0.0.3
Tracing the Base Images: Base Image Source Extras
jupyter/datascience-notebook:ed2908bbb62e https://github.com/jupyter/docker-stacks/blob/main/images/datascience-notebook/Dockerfile - Repo structure has recently been refactored - Contains a chain of images, should be traced - https://github.com/jupyter/docker-stacks/wiki/aarch64-base-notebook-ed2908bbb62e - Image size: 849MB
rocker/geospatial:4.2.1@sha256:5caca36b8962233f8636540b7c349d3f493f09e864b6e278cb46946ccf60d4d2 https://github.com/rocker-org/rocker-versioned2/blob/master/dockerfiles/geospatial_4.2.1.Dockerfile Contains a chain of images, should be traced
k8scc01covidacr.azurecr.io/sas4c:0.0.3 https://github.com/govcloud/docker-sas4c/blob/master/Dockerfile CentOS is discontinued, should be refactored.
vexingly commented 1 year ago

Looks like the two best sources of notebook images that we can start from are:

Juypter: https://github.com/jupyter/docker-stacks/tree/main/images

Kubeflow: https://github.com/kubeflow/kubeflow/tree/master/components/example-notebook-servers

These are fairly light weight but do split themselves into different specialties... use a good matrix / inheritance organization system... perhaps we should compare and contrast these options (including our current images).

Also this will be useful to review of course: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

Souheil-Yazji commented 1 week ago

We can store build artifacts (i.e trivy scans) for 90 days and consume those files into issues. https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow

Souheil-Yazji commented 1 day ago

https://github.com/StatCan/aaw/issues/1991