chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
63 stars 12 forks source link

[Implementation] Curators want collection revisions to be automated to increase their velocity #6929

Closed nayib-jose-gloria closed 5 months ago

nayib-jose-gloria commented 6 months ago

Discovery ticket:


Before scheduling this work, I propose waiting to stand-up the Jupyterhub cluster we plan to host in our AWS account to give remote curation access to curators. This work may alleviate the pain points that led to this request such that its not as pressing to implement this feature.

Edit--added context: The semi-automation requests were made to alleviate how long and how many local resources downloading/uploading large datasets take to process relatively small changes. Implementing remote semi-automation of dataset transformations has to be incredibly limited/parametrized or otherwise undergo a rigorous security review, as it is risky to allow even authenticated users to submit arbitrary code snippets to run against parts of our corpus. It is preferable to see if the JupyterHub solution alleviates concerns about long curation times for large datasets enough to avoid semi-automation.

This solution is an evolution of the proposed solution in https://app.zenhub.com/workspaces/single-cell-5e2a191dad828d52cc78b028/issues/gh/chanzuckerberg/single-cell/145, which is focused on uns-only updates (which is a simpler problem we already have the infrastructure to accommodate)

Estimate: 5-6 weeks x 2 engineers

Proposed Approach (align with a tech spec first):

brianraymor commented 5 months ago

Closing since full scripting of revisions has been downscoped to title renames and mitigation with a curator JupyterHub.