HumanCellAtlas / dcp

Data Coordination Platform manifest and integration tests.
3 stars 1 forks source link

Removal of living European data from project: Spatio-temporal immune zonation of the human kidney #536

Open morrisonnorman opened 4 years ago

morrisonnorman commented 4 years ago

Issue

It has been brought to our attention that the [Spatio-temporal immune zonation of the human kidney] (https://data.humancellatlas.org/explore/projects/abe1a013-af7a-45ed-8c26-f3793c24a1f4) project contains data from living European donors. ALL instances of data associated with living European donors needs to be removed/deleted from DCP components.

High Priority Task List

Remove all instances of data associated with living European donors from DCP components. Deleted from:

A more detailed list of tasks and nascent Project Deletion SOP is being developed in this google doc

11/15/19 More comprehensive Project Deletion SOP.

Requirements

The old project UUID should be maintained as this is already being referenced by 3rd parties. The project can be reingested once the living European data has been removed. The reingested data needs to be linked* to the the old project UUID.

AUDR

AUDR capability does not currently handle partial deletion of data from projects. Project deletion involves removing the entire project dataset and associated bundles. Reingest and manually assign the existing project UUID to the newly ingested project.

Next steps

Once the high priority deletion task list has been completed, the project can be reingested - with the living European donor metadata and data removed. Note that the reingested data needs to be linked* to the the old project UUID.

*MVP for reingestion is a direct replacement of the project UUID with no versioning.

jahilton commented 4 years ago

Quick look at what donors are in what bundles...

bundle_uuid donor_organism.provenance.document_id donor_organism.is_living
9914ac5e-772f-4347-ae33-290bd67ddf11 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
1076c830-8986-4fee-9e7a-847fe0b227c7 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
6d9faabd-5da1-4643-8f02-bce67d9fc9bb 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
0b80acf7-2295-409f-9f33-3f05fc3f4158 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
227f2010-a6df-4c1b-9fb7-fc6cedeff8ce 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
2165e389-4760-4e89-98a8-49dbc0af00d0 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
0cc25d1d-6218-4dad-ba5f-91ac4144a24b 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
26dadf9e-4690-48df-b7f1-eac2b50c1462 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
186c2fc5-fcc4-4de4-b577-179cf73c0d94 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
7e330852-f693-4303-bc83-09e6d145d737 26f8b5cf-79a2-441f-a71a-a272a06c4a78 no
10ae73ef-d2e1-4a1d-a192-cf332d850c4f 2a7fbc94-aba3-431d-9156-a1c613e1e681 no
71a582dc-6a03-4bb4-a4f7-29bbf289d8b9 2a7fbc94-aba3-431d-9156-a1c613e1e681 no
67bd4c99-a5f8-4607-8571-9c97fed0832e 2a7fbc94-aba3-431d-9156-a1c613e1e681 no
b44a0842-b2c0-4392-91f7-57e312c0507e 2a7fbc94-aba3-431d-9156-a1c613e1e681 no
4717eb27-350f-41c1-adc2-3589e17048ee 3c7b6d6f-9502-4851-9668-52a433f4a3b2 no
b6037eec-60da-48da-85a4-b7438539a6ae 3c7b6d6f-9502-4851-9668-52a433f4a3b2 no
192ccf23-5847-458d-9527-c730677e7093 3c7b6d6f-9502-4851-9668-52a433f4a3b2 no
5c4bb0a6-705e-43a1-9714-fc7a60334bed 3c7b6d6f-9502-4851-9668-52a433f4a3b2 no
a44844d6-2728-4595-9fa7-bb6f2a34b629 5b56117b-5170-4d7f-8ce7-674147a58a72 no
0d6d1122-acf0-466d-ab0b-a131edfb663e 5b56117b-5170-4d7f-8ce7-674147a58a72 no
3013e0db-e471-4134-9e0c-b3bed2464a8e 5b56117b-5170-4d7f-8ce7-674147a58a72 no
368cd6c7-515b-4173-9703-8a95796bba62 5b56117b-5170-4d7f-8ce7-674147a58a72 no
f7081630-0d97-4550-8b16-77008ea868a3 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
2e373a6e-f531-449b-a989-eb163322ce83 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
a36a76dd-3036-42e6-beb2-dd41e1e312b6 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
b7e88f9e-45e4-4472-aa77-fc46886614a4 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
8cd7c4db-675b-4412-bb60-df0340245828 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
6a6bd6e3-af01-4e0c-81ad-97eb0bc15db5 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
c02ec92e-1aa0-49ae-a2fd-ceeb4b0cbdb3 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
08ca411d-c58a-487a-b7da-4d5e83732252 87316bb3-ad5e-4921-8f5f-29acad300ad2 no
5a37f51b-3f66-4b22-95c1-d82842265f19 98961f76-94cd-4848-8710-d2ce8edab34f no
4accc284-186c-4b81-9810-29314a4a6948 98961f76-94cd-4848-8710-d2ce8edab34f no
9eb1a782-4c15-42b7-8d4a-e6dfca346fe3 98961f76-94cd-4848-8710-d2ce8edab34f no
6529d0e9-22ff-4276-99ca-884e45a5ceb3 98961f76-94cd-4848-8710-d2ce8edab34f no
2e478e45-e8e5-402a-b41f-5a9c2f2edd3a 98961f76-94cd-4848-8710-d2ce8edab34f no
89a909ff-43e2-4ef9-832a-11ea6ab7689a 98961f76-94cd-4848-8710-d2ce8edab34f no
e22e7f03-4917-4ab9-ba0c-31784fa62de9 d209d8fc-fd2c-452c-82e3-a81652276884 no
5c4b214d-ffd7-426b-88a2-cf611e5803f7 d209d8fc-fd2c-452c-82e3-a81652276884 no
4420213f-8c68-4559-8d76-b583820a1ff3 d209d8fc-fd2c-452c-82e3-a81652276884 no
284679d1-6b69-4d52-8774-c13943610299 d209d8fc-fd2c-452c-82e3-a81652276884 no
ad88924f-248f-4c6e-bad9-d64495b03de1 d3387a14-d06d-4b2b-bc9b-e7ece3be964a no
3b17e3c9-5f1c-49c6-b4fc-e3d658e7b731 d3387a14-d06d-4b2b-bc9b-e7ece3be964a no
6ff1a6f0-f214-45bc-878a-f8919cb3e581 d3387a14-d06d-4b2b-bc9b-e7ece3be964a no
3bab143b-606a-4890-afa1-7dfa92370cf7 d3387a14-d06d-4b2b-bc9b-e7ece3be964a no
499b690e-feee-4675-8531-882a62b6020c 191eed41-c398-4911-b2d8-63f110a0e823 yes
fff16441-5212-4319-bf52-9e23d0eff7eb 191eed41-c398-4911-b2d8-63f110a0e823 yes
5b8fc043-b00e-4699-b75f-19be0c7d9dc0 191eed41-c398-4911-b2d8-63f110a0e823 yes
ce1f6515-6143-4e77-b162-fe6060cc10e6 191eed41-c398-4911-b2d8-63f110a0e823 yes
4882ec6f-fe81-43d8-9446-595fbf6319da 191eed41-c398-4911-b2d8-63f110a0e823 yes
27a43436-6e63-44ce-a2b3-80618f00b7f6 191eed41-c398-4911-b2d8-63f110a0e823 yes
0d7aee1d-a3d7-4532-903a-c531ac07998d 191eed41-c398-4911-b2d8-63f110a0e823 yes
9ca7449a-4b91-4cdd-b226-a7fbb7b1ffd0 191eed41-c398-4911-b2d8-63f110a0e823 yes
8b7635a4-88a4-4c76-9946-6e3bebd51bfa 1e851cf7-106f-45dc-b65f-222c805bc961 yes
1a26883e-d034-491c-8d0a-0e93e854eacb 1e851cf7-106f-45dc-b65f-222c805bc961 yes
daaf63cf-04ae-4335-868c-32e63af0c37a 1e851cf7-106f-45dc-b65f-222c805bc961 yes
6f01e3ae-9743-4424-983b-18b18f0ffeed 1e851cf7-106f-45dc-b65f-222c805bc961 yes
002fe390-ba3f-4aa7-ae43-aa7949b09bb1 1e851cf7-106f-45dc-b65f-222c805bc961 yes
a1e2d3d2-3b47-4d20-a331-548818f3604b 1e851cf7-106f-45dc-b65f-222c805bc961 yes
caf3a1b6-60d3-42fd-a56c-48fe0e6de571 1e851cf7-106f-45dc-b65f-222c805bc961 yes
79ec8f41-fccc-46ed-af43-e63892238d77 1e851cf7-106f-45dc-b65f-222c805bc961 yes
a0a78a4e-7891-4ef2-a22d-1bfab43ffe47 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
6086e0e1-c679-44ad-bc62-57875b216fc4 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
7b4fd692-c38d-43c6-bb5b-bf3dcbc837eb 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
f269045b-1b3b-40d6-86ae-398a5b699e1d 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
c0e82caf-6fed-4524-9267-d97ac0a4d2f2 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
f6d920a5-63ab-4d4b-8fb7-0fe2fdbf2fcd 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
67cdc6a8-9f47-4dd9-908d-9654e489361d 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
97735720-4c1f-4b69-9463-e6c9febc2764 52e1680e-4a73-46ed-ba7c-8e52de2aa04e yes
f838b9fe-3e03-41cf-8e1f-1f17db456975 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
3acc603d-b617-49ae-aa0d-353db0aa9e7e 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
880a3226-d432-4ab7-b81a-4c00421c6590 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
32b6880e-6c5c-41e4-b5b1-96b097cc52da 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
c99ea041-0daf-4b25-bee9-d9691af11a5e 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
5bf86ddb-a57b-49be-b734-8b09d7259e60 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
7ef3700c-d2d3-4e6e-b73e-c22f3380e7c3 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
70372f87-4cd4-40d7-8c09-1ebc641e1384 5e31ffe8-a7b2-4478-adc1-4f7368339e71 yes
d8f7f78d-1c77-49cc-b934-28d0da11c6d0 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
8388fed2-1af2-4102-a578-82d3ef5ace03 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
9d993f96-67d2-4640-8217-89b1ecc60028 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
6963e95a-f81d-4bfc-9639-46da2a958d7a 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
f840f06a-58b1-4699-a672-ef0b08fb3304 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
8f585ec2-54c2-4d7b-8cea-0a9ceada6c8f 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
88aed2c9-6537-40fe-a04c-cd58900e9abf 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
a182995c-83c9-4da9-8029-f73b1d742d1b 6ce8c2a2-d013-4bdd-8444-3bba7994125a yes
00005cce-dfbe-40a9-922a-e1da11a11ada a9efb8f5-fd92-4f49-9fc8-18310979752b yes
ea25386a-dc17-453d-9312-b61995f2c3f6 a9efb8f5-fd92-4f49-9fc8-18310979752b yes
328a2bf0-3f3c-4441-96a8-ddf41852e650 a9efb8f5-fd92-4f49-9fc8-18310979752b yes
363aeb9a-010d-49ec-b8e0-9062552b9062 a9efb8f5-fd92-4f49-9fc8-18310979752b yes
3c3061a7-1dc5-422e-bc14-4760b47f6ad2 b4997866-8e80-49af-823a-83d24aaa2d83 yes
81c5a0ed-50d3-4c60-b1b4-57df9862b636 b4997866-8e80-49af-823a-83d24aaa2d83 yes
62893d1d-91ea-4842-8d76-d6d4e67a795e b4997866-8e80-49af-823a-83d24aaa2d83 yes
0dce2a49-1d81-45a2-98c3-75c1e154e96e b4997866-8e80-49af-823a-83d24aaa2d83 yes
9e91edba-447e-47d3-8095-4406520e2fe9 b4997866-8e80-49af-823a-83d24aaa2d83 yes
9bd51e8f-abb2-40e6-8ee2-e0afa8b6caa6 b4997866-8e80-49af-823a-83d24aaa2d83 yes

@morrisonnorman are all 7 living donors European?

morrisonnorman commented 4 years ago

@ESapenaVentura Can you confirm for @jahilton which of the 7 living donors are European?

Previous quote from Enrique: "It seems that donors txk2, txk3, pRCC, RCC1, RCC2, RCC3 and VHL are alive".

morrisonnorman commented 4 years ago

@jahilton All of the living donors are European.

jahilton commented 4 years ago

Here's our living donor bundle list

499b690e-feee-4675-8531-882a62b6020c
fff16441-5212-4319-bf52-9e23d0eff7eb
5b8fc043-b00e-4699-b75f-19be0c7d9dc0
ce1f6515-6143-4e77-b162-fe6060cc10e6
4882ec6f-fe81-43d8-9446-595fbf6319da
27a43436-6e63-44ce-a2b3-80618f00b7f6
0d7aee1d-a3d7-4532-903a-c531ac07998d
9ca7449a-4b91-4cdd-b226-a7fbb7b1ffd0
8b7635a4-88a4-4c76-9946-6e3bebd51bfa
1a26883e-d034-491c-8d0a-0e93e854eacb
daaf63cf-04ae-4335-868c-32e63af0c37a
6f01e3ae-9743-4424-983b-18b18f0ffeed
002fe390-ba3f-4aa7-ae43-aa7949b09bb1
a1e2d3d2-3b47-4d20-a331-548818f3604b
caf3a1b6-60d3-42fd-a56c-48fe0e6de571
79ec8f41-fccc-46ed-af43-e63892238d77
a0a78a4e-7891-4ef2-a22d-1bfab43ffe47
6086e0e1-c679-44ad-bc62-57875b216fc4
7b4fd692-c38d-43c6-bb5b-bf3dcbc837eb
f269045b-1b3b-40d6-86ae-398a5b699e1d
c0e82caf-6fed-4524-9267-d97ac0a4d2f2
f6d920a5-63ab-4d4b-8fb7-0fe2fdbf2fcd
67cdc6a8-9f47-4dd9-908d-9654e489361d
97735720-4c1f-4b69-9463-e6c9febc2764
f838b9fe-3e03-41cf-8e1f-1f17db456975
3acc603d-b617-49ae-aa0d-353db0aa9e7e
880a3226-d432-4ab7-b81a-4c00421c6590
32b6880e-6c5c-41e4-b5b1-96b097cc52da
c99ea041-0daf-4b25-bee9-d9691af11a5e
5bf86ddb-a57b-49be-b734-8b09d7259e60
7ef3700c-d2d3-4e6e-b73e-c22f3380e7c3
70372f87-4cd4-40d7-8c09-1ebc641e1384
d8f7f78d-1c77-49cc-b934-28d0da11c6d0
8388fed2-1af2-4102-a578-82d3ef5ace03
9d993f96-67d2-4640-8217-89b1ecc60028
6963e95a-f81d-4bfc-9639-46da2a958d7a
f840f06a-58b1-4699-a672-ef0b08fb3304
8f585ec2-54c2-4d7b-8cea-0a9ceada6c8f
88aed2c9-6537-40fe-a04c-cd58900e9abf
a182995c-83c9-4da9-8029-f73b1d742d1b
00005cce-dfbe-40a9-922a-e1da11a11ada
ea25386a-dc17-453d-9312-b61995f2c3f6
328a2bf0-3f3c-4441-96a8-ddf41852e650
363aeb9a-010d-49ec-b8e0-9062552b9062
3c3061a7-1dc5-422e-bc14-4760b47f6ad2
81c5a0ed-50d3-4c60-b1b4-57df9862b636
62893d1d-91ea-4842-8d76-d6d4e67a795e
0dce2a49-1d81-45a2-98c3-75c1e154e96e
9e91edba-447e-47d3-8095-4406520e2fe9
9bd51e8f-abb2-40e6-8ee2-e0afa8b6caa6
ESapenaVentura commented 4 years ago

We don't have explicit data about the donor ethnicity, although it's safe to assume that all of them are European. I can look at the paper for further confirmation if necessary

jahilton commented 4 years ago

@ESapenaVentura if there's any uncertainty, can you contact Benjamin?

morrisonnorman commented 4 years ago

RE European donors: The samples were all collected from British Hospitals, so they are all residents in Europe and subject to GDPR. All data associated data needs to be removed.

@jahilton Although he priority is to delete the 'living bundles', we don't have a mechanism to delete specific bundles and update the metadata for the project.

I think it will have to be a 'delete all' from the project... and reingest.

lauraclarke commented 4 years ago

Worth noting that European in the GDPR context isn't about ethnicity but about residency. I think we have to assume that any biopsy collected by a European institution is covered by GDPR

diekhans commented 4 years ago

Instead of reingesting, it should be possible to write a one off program to edit the bundles and create new version, then delete the old versions.

Norman Morrison notifications@github.com writes:

RE European donors: The samples were all collected from British Hospitals, so they are all residents in Europe and subject to GDPR. All data associated data needs to be removed.

@jahilton Although he priority is to delete the 'living bundles', we don't have a mechanism to delete specific bundles and update the metadata for the project.

I think it will have to be a 'delete all' from the project... and reingest.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/HumanCellAtlas/dcp/issues/536#issuecomment-543204937 RE European donors: The samples were all collected from British Hospitals, so they are all residents in Europe and subject to GDPR. All data associated data needs to be removed.

@jahilton Although he priority is to delete the 'living bundles', we don't have a mechanism to delete specific bundles and update the metadata for the project.

I think it will have to be a 'delete all' from the project... and reingest.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.*

morrisonnorman commented 4 years ago

@diekhans It would be great if there's a better way to address the issue that doesn't involve reingestion in a timely manner

ESapenaVentura commented 4 years ago

The context in this case is to delete bundles that contain the aforementioned donors, but the rest of the bundles (in case we can AUDR and not re-ingest) will be unaffected by this. Even if they remain unaffected, would you need new versions of them? @diekhans

lauraclarke commented 4 years ago

@ESapenaVentura do the project records specifically reference the cancer biopsys or link to the donors which will be deleted in a manner that isn't fixable with simple updates?

ESapenaVentura commented 4 years ago

I don't think I understand the question. Do you mean if the project metadata references the cancer biopsies?

lauraclarke commented 4 years ago

Yes, will the project level metadata be made inaccurate or any links in the deceased donor bundles be made invalid when the cancer biopsy bundles are deleted.

Those would be potential things which might require re-ingestion rather than deleting the bundles which reference the alive donors

If no deceased donor bundles reference living donors and any project-level metadata that references the cancer biopsy donors can be updated via simple updates then we might not need re-ingestion

lauraclarke commented 4 years ago

Okay based on discussions in slack and in person with Tony and Alegria there are other reasons that just deleting the needed bundles might lead to challenges later so I suspect this isn't a suitable solution

jahilton commented 4 years ago

Tickets have been created & linked to this Epic for Ingest DataStore Browser MatrixService SecondaryAnalysis

@parthshahva does not believe Upload maintains any record so action may not be required. He will comment here once confirmed.

parthshahva commented 4 years ago

Per discussion in #data-wrangling @ESapenaVentura will be cleaning up the corresponding staging area in the wrangler workspace. I've confirmed that the production staging area was cleaned out after the submission/exporting completed and data was copied over to DSS.

jlzamanian commented 4 years ago

Wrangler issue ticket https://github.com/HumanCellAtlas/hca-data-wrangling/issues/344

jlzamanian commented 4 years ago

Wrangler issue ticket https://github.com/HumanCellAtlas/hca-data-wrangling/issues/344

jlzamanian commented 4 years ago

More comprehensive Project Deletion SOP.