ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

PRJNA694128 - Co-evolution of tumor and immune cells during progression of multiple myeloma #492

Closed idazucchi closed 2 years ago

idazucchi commented 2 years ago

Project short name:

MultipleMyelomaCoevolution

Primary Wrangler: @idazucchi

Secondary Wrangler:

@ipediez

Associated files

Published study links

Key Events

Ingest: https://contribute.data.humancellatlas.org/projects/detail?uuid=2ad191cd-bd7a-409b-9bd1-e72b5e4cce81

idazucchi commented 2 years ago

I uploaded the metadata spreadsheet in ingest.

Blockers
  1. The graph validation is currently blocked by the missing fastq files #516. Currently concatenating fastq files
  2. The sequencing platform used for each sample is missing. I contacted the authors again, since 2 weeks have passed from their last reply. If they reply after release 12 I can update the with the new information
idazucchi commented 2 years ago

Newly concatenated fastq files uploaded to ingest: 109/121 validated

Problem: Fastq files truncated

Fastq for 3 SRRun accession are invalid. Errors again about files being truncated or sequence and quality length not matching.

I will investigate the issues

Graph validation blocked

idazucchi commented 2 years ago

Graph validation unblocked The graph validation returns no_orphans.adoc error. I think this is due to the sequencing_protocol_1 which isn't linked to anything while I wait for the authors reply. Deleting the protocol is not possible from the ingest UI so @ESapenaVentura deleted it through the API. The error however is still present so I have to look into it

idazucchi commented 2 years ago

no_orphans error Error due to the presence of unlinked supplementary file. Discussed with Wei and @ESapenaVentura and we agreed that supplementary files should be allowed to be unlinked and the test needs to be modified. I made a ticket in the graph validator repo.

idazucchi commented 2 years ago

For some reason each fastq file was assigned to a different process ID, so the graph validator raised an error. I modified the spreadsheet to manually assign process IDs so that the right fastqs are grouped together.

As soon as the graph validator changes are in production the no_orphan error should disappear and I will move the dataset to secondary review.

idazucchi commented 2 years ago

The authors confirmed that all the samples were processed with Illumina NovaSeq 6000

ipediez commented 2 years ago

A very complete submission, good job! Here you have some suggestions:

Project

Project - Contributors

Project - Publications

Donor organism

Sequencing protocol

Supplementary file:

idazucchi commented 2 years ago

I've applied the changes and the project has been exported

hlmfw commented 1 year ago

The work is pretty good. I am a Phd candidate and hope to analyze some information in your data. However, [PRJNA694128] documents in the article are not available. Would you please give me a site to download all single cell data of the work. Thank you very much!