Minimal required workspace software

alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io

BSD 3-Clause "New" or "Revised" License

60 stars 15 forks source link

Minimal required workspace software #1574

Closed jemrobinson closed 3 months ago

jemrobinson commented 1 year ago

:white_check_mark: Checklist

[x] I have searched open and closed issues for duplicates.
[x] This is a request for a new feature in the Data Safe Haven or an upgrade to an existing feature.
[x] The feature is still missing in the latest version.
[x] I have read through the documentation.
[x] This isn't an open-ended question (open a discussion if it is).

:strawberry: Suggested change

Currently the workspaces have very little installed: R, Python and libraries to interact with databases. What is a minimal useful workspace. Ideally small enough that we don't need to pre-build it (either with current scripts or bureau).

:steam_locomotive: How could this be done?

JimMadge commented 1 year ago

I would really like to do this with bureau, start using it sooner rather than later.

I would also like to keep bureau useful outside of DSH, with a probably smaller DSVM style image. The pragmatic solution might be to add a new SKU to bureau which is DSH_DSVM, adding the desktop, GPU drivers, extra software. That way it can always be migrated elsewhere if we want to, and we don't have to worry about integrating configuration management into the pulumi code just yet.

jemrobinson commented 1 year ago

I've actually been wondering whether we need a pre-built image at all. Definitely if we use a pre-built image it should be bureau but is a VM with just Python, R and a few other tools good enough? Advantage of a VM that's built on-the-fly is that it's easier to test and should always be up-to-date with the rest of the code.

JimMadge commented 1 year ago

I think the issue with building on demand is it will waste time and energy.

It depends how far the SRD image is from the image it is build on. If we start from a headless image and install X/wayland, a desktop environment, GPU drivers and libraries, programming languages, graphical programs I would guess it would take around half an hour. At least for Mousehole, it was always the majority of the deployment time.

Another way would be to use the bureau workflows to build images on a regular basis (say weekly) and allow for manually dispatching a build.

jemrobinson commented 1 year ago

Another possibility is to use a Docker container as the source. I was sceptical but @manics says that this works for him.

manics commented 1 year ago

Here's an example of a Ubuntu 22.04 MATE desktop with VNC running in Docker https://github.com/manics/jupyter-guacamole/

docker run -it --rm -p 5901:5901 -eLOCALHOST=no ghcr.io/manics/ubuntu-mate-vnc:main and connect to localhost:5901 with your VNC client

JimMadge commented 6 months ago

A good enough for now approach, waiting on Bureau,

Ansible playbook with desired state for workspaces
Ansible runner, ansible-pull or similar to apply configuration to each deployed workspace

JimMadge commented 5 months ago

Ansible pull is appealing to me.

We could have cron/systemd timer to run playbooks regularly. That could enforce desired state. It would also be possible to update deployed workspaces by pushing changes to your playbook.

Ansible pull pulls from a git repository. We could not do that and pull from somewhere else. It would be good to have something like the playbook in a git repo or blob inside the TRE. Admins could push to that from outside, and workspaces fetch from inside.

@craddm @jemrobinson thoughts?

I think maybe the sensible solution for now is create a container inside each SRE with the playbook in. And a regular script to fetch that and run ansible.

jemrobinson commented 5 months ago

Mounting an Azure container into a VM is relatively easy (doesn't need credentials to mount as NFSv3). Connecting to a container to pull a file will be more complicated (likely to need a ManagedIdentity for the VM resource and to give that identity permissions on the container).

JimMadge commented 5 months ago

Pushing to a container that is mounted seems like a good solution then.

jemrobinson commented 5 months ago

I was thinking about this earlier today and I think it might also work to pull from a non-mounted container that is locked down from public access but available anonymously for private access.