Barts-Life-Science / AzureTRE

An accelerator to help organizations build Trusted Research Environments on Azure.
https://microsoft.github.io/AzureTRE
MIT License
0 stars 0 forks source link

Mapping user identities into workspaces (virtual machines) #80

Closed TonyWildish-BH closed 2 weeks ago

TonyWildish-BH commented 3 months ago

Is your feature request related to a problem? Please describe. When users create VMs in their workspaces, they get a random username, different in each machine. Yet the UID is always the same, so the workspace is essentially single-user.

So, if they create git repositories, for example, there is no way of knowing who, in a project, has really committed code to the repo. Likewise, in the shared storage, there's nothing to distinguish which user in the project has created a particular file or directory.

Describe the solution you'd like User accounts should be mapped from their Entra ID properties into the workspace, so there's at least something to clearly identify who's who. This can be done by maintaining some sort of configuration object in the workspace which can be accessed by all the VMs at configuration time. Also, the VM will need to know which user clicked the 'Create' button!

TonyWildish-BH commented 3 weeks ago

Building on the discussion this morning in the standup, here's how I see this need:

What I'd like is:

  1. That each user, when they create a VM, is dropped into that VM with a username that reflects their real name, from Entra ID.
  2. That each VM has all the user accounts mapped to it, consistently
  3. That each user be able to log into any of the VMs in their project - e.g., a project may decide to have one development machine each and one shared GPU machine for ML projects
  4. That all user accounts can be mapped into Gitea, with admin/user role matching their admin/user status in the project.
  5. That the shared storage be user-aware, so files can have proper ownership, and simple file protections can be put in place.

Item 1 requires that the VM, during the boot, knows who created it. This is the first important step. Either the username is fed to the boot process as a config variable, or the VM has to be able to contact some service to be able to find out who created it. This is the main requirement, and the only one that needs to be satisfied now.

Item 2 requires a map of usernames, either from CosmosDB or from the project Key Vault. I prefer the second option because a) it's cleaner, the VM doesn't need to contact resources outside the workspace resource group, and b) we have similar requirements to contact the Key Vault to pull out things like Gitea or MySQL credentials, when those services are in use. This would be nice for the MVP, required for Production.

The data could come from the Data Portal, pushed into the store when the workspace is created, or it can be pulled from the role mapping in Azure, after the workspace is created. The second might be easier, more contained, and can be a CLI tool.

Item 3 requires Guacamole to be fundamentally reconfigured. Item 4, likewise, requires significant reconfiguration of Gitea. Item 5 is also a chunk of work, figuring out how to mount the shared storage in a way that is user-aware. None of these are going to be worked on until the MVP is out, there are more important things to do.

TonyWildish-BH commented 3 weeks ago

There is a third way of passing a usermap into a booting virtual machine, which Bioku and I discussed, and that's to pass it to the Docker image for the VM template when the Resource Processor executes it. Thinking about it, this requires more surgery in the SDE core, which I don't like. Passing via the workspace Key Vault is cleaner, the VMs need only have a role they can access which allows them to pull from the vault, and that's then a generic tool we can use in other ways.

akolensky commented 2 weeks ago

This has now been parked.