MAAP-Project / Community

Issue for MAAP (Zenhub)
2 stars 1 forks source link

Project workspaces vs. user workspaces #519

Open lauraduncanson opened 2 years ago

lauraduncanson commented 2 years ago

Is your feature request related to a problem? Please describe. Right now all of my data is visible in any of 'my' workspaces, and for dps_out data, it's only available in my workspaces. This creates a few areas of confusion / concern:

1) Accessing DPS outputs. When people on my projects want to access my dps_output data they can only do so within a workspace I started and shared with them. For the boreal project our workaround has been that the entire project team works within workspaces I created, and runs DPS jobs with my username, so they go to my dps_output directory. This causes some frustration because the 'my-private-bucket' in these workspaces disappears frequently (usually about once a day), AND because it's not possible to track what individual actually ran the job, they all just fall under my runs.

2) Sharing data only within one projects Because all of my workspaces have ALL of my data across ALL of my projects, and to share the dps_output data to anybody working on ANY project, it means that literally anybody on any of my projects can see all the stuff in ALL projects. It would be much better if I could ONLY share the data/code related to a project with ONLY the people working on that project (project:project, not all_projects:all_projects). It would make it cleaner & simpler. E.g. boreal folks don't need to access biomass harmonization stuff, and I don't want biomass harmonization folks accessing all of the in development boreal code before it is clean, so i can't share any workspace with anybody on that project.

Describe the solution you'd like I want to start a workspace with both a project ID/name and my username, and have only data copied to workspaces with both those tags (so boreal data/code stays in boreal workspaces, harmonization stays there, gedi vis stays in a gedi vis workspace, etc.).

I also want a merged dps_output for each project ID/name rather than in a user-specific folder. So instead of 'my-private-bucket' it could be 'project-private-bucket/dps_output' and the subdirectories could be tagged by algo name like they already are, but then with sub-directories of username. It would help us keep track of who is running what.

Describe alternatives you've considered See our work arounds above.

Additional context Add any other context or screenshots about the feature request here.

gchang commented 2 years ago

For 1) the best solution is to properly implement group feature to create ad-hoc groups within the MAAP platform. This would entail some complexities as we'd not only need to modify the user data model to support it, but also create new interfaces to allow people to interact with group membership.

A possible intermediate solution is to tell DPS to put data into the "my-public-bucket" directory, which is available to everyone on the platform via other's "shared-buckets" folders. This does mean that the data will be public within the MAAP platform.

For 2) One of the reasons we went with a persistent file system for the home directory is to preserve data just in case the workspace gets shut down. We've had (and sometimes still have) workspaces that are shut down by the system due to memory load issues. With the old model of ephemeral file systems, data would just disappear when the workspace was restarted.

The best solution would be to 1) choose whether to use a cross-workspace file system or a single-workspace file system and 2) Find out a way to persist files across workspace stoppages and only delete them when the workspace is destroyed.

rtapella commented 2 years ago