Open pnejad opened 3 years ago
I'm not able to download the spreadsheet for 7adede6a-0ab7-45e6-9b67-ffe7466bec1f either.
@ESapenaVentura can you confirm that the 3 EBI datasets were mixed bulk RNA-seq and hence were intentionally set to have specimens linking into sequencing input?
@Wkt8 I can confirm the 3 EBI datasets were Bulk + single cell RNA seq!
intentionally set to have specimens linking into sequencing input
The specimens are the sequencing input.
@hannes-ucsc cell suspensions are not limited to single cells. So even for bulk experiments, the specimen is processed into suspensions of cells before the bulk-RNA-seq is carried out.
What I meant was that in the current metadata graphs for these projects the specimen_from_organism
entities are the sequencing input, instead of being linked to sequencing input. Whether the current metadata correctly describes the experiment in reality is another question, one that you all, the wranglers need to agree on. We can't have these types of ~projects~ experiments modeled one way by one team, and another way by another team.
These are the only four projects that have a sequencing input that is not a cell suspension. If 1) consensus is that even in these projects a cell suspension was actually used as the sequencing input and 2) the metadata for these projects is updated to reflect that, and 3) consensus is that sequencing input has to be a cell suspension for all types of experiments, then we can remove the concept of sequencing input. I initially introduced it based on these statements by Mallory and Tony (@tburdett):
https://github.com/HumanCellAtlas/metadata-api/issues/13#issuecomment-415337446 https://github.com/HumanCellAtlas/metadata-api/issues/13#issuecomment-415564276
@Wkt8 @ESapenaVentura Why would you set specimen as sequencing input for bulk data? A cell suspension is just a pool of cells that would still need to be created for library prep even in bulk. The only difference is they didn't separate them into single cells but that's how we handle 10x data.
I do not agree with Mallory's blood example. Blood that is collected from a donor still needs to be enriched for PBMC's and put into cell suspensions before the library prep protocol is carried out.
I do agree with her statement that "...there will most likely never be cell suspensions prior to an imaging assay". But then again, I don't think an imaged specimen would be the sequencing_input. Wranglers - I'm not an expert when it comes to imaging assays, so please correct me if I'm wrong here.
Same data being modelled differently by different submitters (wranglers) has been on my mind lately. Right now we have submitters from the EBI, UCSC, and Lattice teams. I would not be surprised if there are more datasets in the DCP with inconsistent metadata. I think this will increase as the number of submitters to the DCP increases over time.
It would be really helpful if there was a way to flag these inconsistencies (maybe via ingest during submission? QA process/team?) and to revisit our wrangling guides frequently to make sure all teams are aligned. Thoughts or suggestions @gabsie @tburdett?
Off topic, but one way to achieve consistency is to review submissions across teams, just like peer reviews of PRs on Github.
@ami-day
@ESapenaVentura is going to test this
@ami-day what? we didn't discuss this on stand-up
@ESapenaVentura is this still required? Do you know?
This is still needed
Description of the task:
@hannes-ucsc pointed out that these 4 datasets have at least 1 specimen which was the sequencing input.
https://data.humancellatlas.org/explore/projects/6c040a93-8cf8-4fd5-98de-2297eb07e9f6 (EBI) https://data.humancellatlas.org/explore/projects/71eb5f6d-cee0-4297-b503-b1125909b8c7 (EBI) https://data.humancellatlas.org/explore/projects/c4077b3c-5c98-4d26-a614-246d12c2e5d7 (EBI) https://data.humancellatlas.org/explore/projects/7adede6a-0ab7-45e6-9b67-ffe7466bec1f (UCSC)
7adede6a-0ab7-45e6-9b67-ffe7466bec1f - The specimen ID was used accidentally instead of the cell suspension ID in the sequence file tab. @rachadele will update this and resubmit the dataset.
For the other 3 datasets, I was not able to download the submitted spreadsheets from ingest (Error - Service Unavailable) to troubleshoot. I did find other things that need to be fixed based on the info shown on the data portal project pages. We can discuss this further during our next wrangler call.