chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
60 stars 12 forks source link

Create Deleted DatasetVersion "Clean up" Job #7286

Open nayib-jose-gloria opened 1 month ago

nayib-jose-gloria commented 1 month ago

Create a "clean-up" batch job that deletes DatasetArtifacts + DatasetVersions for an input List of DatasetVersionIds.

See data model in orm.py and business logic entity objects in entities.py

Leverage existing functions for artifact/DB clean-up such as: business layer function to delete dataset version artifacts from S3

persistence layer function to delete dataset version + associated dataset artifact rows from the DB

Other batch jobs created in data-portal before: Publish revisions: code and infra Dataset metadata update: code and infra

Memory / vcpu requirements should be reasonably fine-tuned during testing in rdev environment.

nayib-jose-gloria commented 1 month ago

Note to @lvreynoso: @Bento007 has suggested looking into using an aws lambda for this instead of a batch job; I think its worth considering