HumanCellAtlas / ingest-central

Ingest Central is the hub repository for the ingest service
Apache License 2.0
0 stars 1 forks source link

Remove spatio-temporal kidney #596

Open jahilton opened 4 years ago

jahilton commented 4 years ago

Acceptance criteria

We had a project submitted where part of the data is in violation of GDPR. The plan is to wipe the project completely from all components and re-ingest the OK data. Actual data files should be considered for high priority/ASAP/urgent removal. Metadata removal is important but not urgent. Prod is considered high priority/ASAP/urgent given it is more known/accessible to users, with other envs important but not urgent.

https://data.humancellatlas.org/explore/projects/abe1a013-af7a-45ed-8c26-f3793c24a1f4 Project uuid: abe1a013-af7a-45ed-8c26-f3793c24a1f4 Project label: KidneySingleCellAtlas

@HumanCellAtlas/data-ops

justincc commented 4 years ago

Does this also require removal of meta/data from backups?

jlzamanian commented 4 years ago

The non-restricted data from Spatio-temporal immune zonation of the human kidney project has been re-ingested using the same project uuid (abe1a013-af7a-45ed-8c26-f3793c24a1f4) and project label (KidneySingleCellAtlas).

The re-ingestion submission uuid is d5410c6e-612d-421a-a66f-2de5e04dd050 on 10/22/2019. This submission should not be removed, nor any future update submission more recent than 10/22/19.

https://ui.ingest.data.humancellatlas.org/submissions/detail?uuid=d5410c6e-612d-421a-a66f-2de5e04dd050

MightyAx commented 4 years ago

Project that needs to be deleted: UUID: 11e9f4f3-1e12-08bc-3e96-b16527682bb8 ID: 5d51692d1a249400085ac36c Submission: 483eb0b1-3196-4a72-9f30-4a7aecdd25b4

justincc commented 4 years ago

This is now deleted. We still need to put scripts in repositories and write up the process for next time.

jlzamanian commented 4 years ago

We've found a number incomplete submissions for this project in the prod ingest UI. The incomplete submissions also contain the restricted metadata and need to be removed.

08/12/19 submission uuid = 483eb0b1-3196-4a72-9f30-4a7aecdd25b4 8/12/19 submission uuid = 1e4fb14d-2380-41db-ae7e-37465de815e6 08/12/19 submission uuid = cfac3abc-c5cb-49f8-866e-379748d72001 08/09/19 submission uuid = e39f4b5a-c2ce-4ac1-89c8-747a20d08e8a 08/09/19 submission uuid = 40972c5f-295a-4e39-9ac8-a1aaca89798f 08/12/19 submission uuid = 483eb0b1-3196-4a72-9f30-4a7aecdd25b4 09/25/19 submission uuid=2afc1a93-f35d-4dec-95b7-7bd54b6da834 08/14/19 submission uuid=702313be-fdde-42ea-89a5-bd1b01531736 10/03/19 submission uuid=9cfca427-6e22-447a-867e-4d81fdb7391c 10/03/19 submission uuid=9e1d7bdc-e4a8-4dac-a131-6434aeb15bd0 09/25/19 submission uuid=0dc428c1-b30a-4ceb-b64b-a5c08ffe2282

justincc commented 4 years ago

Blocked on a reply from data ops on whether we should proceed with eliminating the remaining production ingest submissions.

jahilton commented 4 years ago

I wouldn't consider this urgent or critical because the data are not public. However, without knowing the legalities behind the data access, I think it is likely that we shouldn't have access. That, plus my understanding that this should be a low-risk activity, I'd say proceed.