chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
64 stars 14 forks source link

Un-tombstone collection a238e9fa-2bdf-41df-8522-69046f99baff and its dataset(s) #1576

Closed brianraymor closed 3 years ago

brianraymor commented 3 years ago

See eng for context.

The collection (Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns) - a238e9fa-2bdf-41df-8522-69046f99baff - and its dataset(s) must be un-tombstoned.

jahilton commented 3 years ago

The Collection contained a single Dataset (66d15835-5dc8-4e96-b0eb-f48971cb65e8). The .h5ad for that dataset is in Dropbox ready for upload, if needed.

brianraymor commented 3 years ago

Madison advised:

It looks like we should be able to untombstone the collection but we will need to reupload the dataset. At that point we can manually update the db to use the previous uuid (if that is necessary)

I can coordinate with Stanford once the collection is restored.

metakuni commented 3 years ago

The collection a238e9fa-2bdf-41df-8522-69046f99baff has been untombstoned:

corpora_prod=> select id, name, tombstone from project where id='a238e9fa-2bdf-41df-8522-69046f99baff';
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------
id        | a238e9fa-2bdf-41df-8522-69046f99baff
name      | Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns
tombstone | f

Ready for reupload of the dataset.

I'll tweak the dataset UUID after the reupload.

jahilton commented 3 years ago

dataset is now uploaded - https://cellxgene.cziscience.com/collections/a238e9fa-2bdf-41df-8522-69046f99baff/private

jahilton commented 3 years ago

Now Published with the former Collection & Dataset IDs

metakuni commented 3 years ago

For posterity's sake and future reference, notes for manual untombstoning of the dataset in this story:

Collection ID: a238e9fa-2bdf-41df-8522-69046f99baff Original dataset ID: 66d15835-5dc8-4e96-b0eb-f48971cb65e8 New dataset ID: 0b8cede0-1a61-4bf9-8b0a-cfdd79757ac8 (generated when Jason reuploaded the dataset)

-- 1. Resurrect (untombstone) the previously tombstoned collection:
update project 
set 
tombstone=false 
where 
id='a238e9fa-2bdf-41df-8522-69046f99baff';

-- 2. Jason reuploaded the dataset to the resurrected collection https://cellxgene.cziscience.com/collections/a238e9fa-2bdf-41df-8522-69046f99baff

-- 3. Resurrect the previously tombstoned dataset:
update dataset 
set 
tombstone=false 
where 
id='66d15835-5dc8-4e96-b0eb-f48971cb65e8';

-- 4. Repoint dataset_processing_status:
update dataset_processing_status 
set 
dataset_id='66d15835-5dc8-4e96-b0eb-f48971cb65e8' 
where 
dataset_id='0b8cede0-1a61-4bf9-8b0a-cfdd79757ac8';

-- 5. Repoint dataset_artifact:
update dataset_artifact 
set 
dataset_id='66d15835-5dc8-4e96-b0eb-f48971cb65e8',
s3_uri=replace(s3_uri, '0b8cede0-1a61-4bf9-8b0a-cfdd79757ac8', '66d15835-5dc8-4e96-b0eb-f48971cb65e8') 
where 
dataset_id='0b8cede0-1a61-4bf9-8b0a-cfdd79757ac8';

-- 6. Rename S3 files:
AWS_PROFILE=single-cell-prod-poweruser aws s3 cp s3://hosted-cellxgene-prod/0b8cede0-1a61-4bf9-8b0a-cfdd79757ac8.cxg s3://hosted-cellxgene-prod/66d15835-5dc8-4e96-b0eb-f48971cb65e8.cxg --recursive
AWS_PROFILE=single-cell-prod-poweruser aws s3 cp s3://corpora-data-prod/0b8cede0-1a61-4bf9-8b0a-cfdd79757ac8 s3://corpora-data-prod/66d15835-5dc8-4e96-b0eb-f48971cb65e8 --recursive

Once the above steps were complete, test by hitting the old dataset URL:

Note: Due to caching, the above failed at first since Explorer still thinks that the dataset is tombstoned. Once the browser cache was cleared, it loaded ok.