chanzuckerberg / single-cell

A collection of documents that reflect various design decisions that have been made for the cellxgene project.
MIT License
4 stars 2 forks source link

Create 'Metadata-Update' AWS Infra #580

Closed nayib-jose-gloria closed 11 months ago

nayib-jose-gloria commented 1 year ago

Create lambda (or another aws job infra type, if better suited), with less memory provisioned than typical dataset DownloadValidate step (look into memory required to load our largest supported datasets in 'backed' mode for both h5ad and Seurat). Leverage same batch compute environment as the typical dataset upload processing job.

inputs needed: dataset_version_id, metadata_key_to_update, new_value

nayib-jose-gloria commented 1 year ago

@brianraymor if the publication DOI is updated in a revision, but it is the only change affecting a particular dataset in that revision--should the update to the 'citation' field represent a new dataset version (and thus, we retain the previous dataset version history with the prior citation)? or should it be an in-place update and retain the same version ID?

brianraymor commented 1 year ago

The original policy was based on the fact that updating collection information had no impact on the dataset. If a DOI is added or updated in a revision, then dataset metadata is being updated. Therefore, I would agree that this is a new dataset version.

brianraymor commented 1 year ago

CC: @jahilton ^^^