ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

linking update for projects with DCP matrices #1226

Open idazucchi opened 7 months ago

idazucchi commented 7 months ago

Context

Updating the linking of a project generally requires the project to be soft deleted from Terra, however, it's unclear what effect the soft deletion would have on:

Description of the task:

We need to understand and then record :

Example project - MVP contributor matrices + DCP generated matrices

short name: HumanAdultKidneyLiaoMo uuid: 2ef3655a-973d-4d69-9b41-21fa4041eed7 ingest aim: update linking, keep existing DCP matrices and remove MVP matrices because they have been added to the project as normal analysis files

Example project - MVP contributor matrices + DCP generated matrices

short name: pbmcCov19Flu uuid: 95f07e6e-6a73-4e1b-a880-c83996b3aa5c ingest aim: remove MVP matrices because they have been added to the project as normal analysis files, keep existing DCP matrices

Acceptance criteria for the task

idazucchi commented 7 months ago

next step: nate will prepare a spreadsheet for each project that contains the link uuids and their content this should allow us to indentify the link files for the DCP generated matrices and MVP matrices, and then delete or keep them according to the need of the project

idazucchi commented 7 months ago

we should also discuss whether we want to keep DCP matrices @gabsie to raise this with Tony

idazucchi commented 7 months ago

Example project - MVP contributor matrices + DCP generated matrices

short name: HumanAdultKidneyLiaoMo uuid: 2ef3655a-973d-4d69-9b41-21fa4041eed7 keep DCP matrix id (analysis file): d7ae23dc-ed81-5ef9-8ec7-d201a92701c9 DCP matrix links id: 51b7ad62-9f03-5ee1-99a3-e6eb4d54c713 keep all analysis files - they are DCP/2 pipelines products links for the other pipeline analysis files: 8911c4d5-ddbd-5d9b-8543-c773356386f3 ee80d2e9-76e9-53eb-9d5d-d76b527ddc47 2d9430d0-9d18-55d6-9ff0-089e2d981f0c delete - highlighted in yellow in the sheet MVP matrix id (supplementary):bd2f5a03-cc0d-5fe1-a492-fb9045e57f92 MVP matrix links id: 4f2fc365-9f97-51ca-bbfe-fe30cefc333d links for sequence files: 58693ea7-bb6c-4d4b-bdf2-169183701581 724f1ba8-5a1b-4756-b0d1-8c2fdc92cb6d c8bb5ce5-7b98-4a7b-aa52-287922f8a9e0

Example project - MVP contributor matrices + DCP generated matrices

short name: pbmcCov19Flu uuid: 95f07e6e-6a73-4e1b-a880-c83996b3aa5c keep DCP matrix id (analysis file): 038565cd-35e3-5a31-afec-733f86b3317d DCP matrix links id: a15d888c-a2ef-5ba5-b0fb-a2ecb6c764b1 new contributor matrix id (analysis file): 253b723a-8c4e-43bd-8b77-35f91c16c103 new contributor matrix links id: 3570c9e1-4746-440d-95a6-1229e6f90c2c delete - highlighted in yellow in the sheet MVP matrix id (supplementary):e0fb66b9-5c49-5cac-b704-a1fed0eae16d MVP matrix links id: 21e70ffa-b57c-5087-93d3-c49b9a4fc6d6

Next step

idazucchi commented 6 months ago

waiting for R35 to be up for review to check the projects look ok in the browser

idazucchi commented 5 months ago

pbmcCov19Flu --> fixed HumanAdultKidneyLiaoMo --> was dropped from R35 because it couldn't be indexed. This happened because the input for the DCP matrices were deleted and the graph couldn't be reconstructed. We have two options here:

  1. remove DCP matrices and intermediate products
  2. connect the DCP matrices to the new fastq files - could be correct or not, the content of the files is unchanged, they were simply merged but the pipelines could have treated them like separate runs