Closed ofanobilbao closed 1 year ago
Request new cell type ontology "immune cell": https://github.com/HumanCellAtlas/ontology/issues/120
Challenging dataset: way too many cell barcodes to link to either new cell suspensions (like an experiment accession) or an analysis file in a single cell in excel or google sheets. We need to find a programmatic way to do the linking in ingest.
Note about cell type ontologies: bvarner-ebi and addiehl suggest rather than 'immune cell', I could use 'leukocyte' and that "immune cell" should be added as a related synonym to the ontology term.
Found a workaround to link all cell suspensions
Emailed the authors about mouse single cell seq data and bulk data.
Info from Kyle: "Ah I think I understand where the confusion is coming from. The accession in the paper only includes FASTQs for 608 SmartSeq2 cells specific to the study. The remainder are from the Tabula Muris Senis are available on that data portal. Of the 608, there are 1 set of FASTQs for the “Tbx4-Cre > ZsGreen1” cells (tagged with KJT) and 2 (across 2 sequencing lanes) sets for the “Axin2-CreER > mTmG” cells (tagged with ANN). 224 KJT + 2 * (384 ANN) = 992 sets of FASTQs. The cell IDs have the form “P-16-CGCTCAGT-TTATGCGA_ANNS256” (after concatenating the lanes). This represents [The 384 well it was sorted into][The i5 and i7 indices][the sorter, Ahmad versus myself][the bcl2fastq index during de-multiplexing]."
Linking data files to samples in spreadsheet: on-going
Requested NCBI data delivery (fastq)
Also waiting on test data: cell suspension as input to cell_suspension
504bec8b-733e-41d1-a5c3-5b19289036cd
Dataset exceeds maximum cell suspension count - error in ingest. Creating a ticket ebi-ait/dcp-ingest-central#869 for this.
@ESapenaVentura will be working on the technical side of this with @amnonkhen
What I know so far of this dataset | Organism | Techinque | Specimens | Cell suspensions | Selection method | Sequence file | Analysis file |
---|---|---|---|---|---|---|---|
Human | 10x | 7 | 17 | MACS | 77 | ||
Human | SS2 | 8 | 9409 | MACS + FACS | 1 | ||
Mouse | SS2 | 2 | 608 - only 525 passed QC but fastqs are available for all cells | ||||
Human | 9 | 9426 | |||||
Mouse | 18 | 671 at most |
Mouse 10x data actually comes from Tabula Muris Senis
@idazucchi and @ESapenaVentura will discuss this today
I updated the metadata for human 10x and uploaded the analysis files to the hca-util area Working on human SS2 data --> too many cell suspensions to link to on analysis file, it hits the carachter limit for excel --> workaround is pooling the cell suspensions based on the plate (so waiting on #927) For the protocol I plan on applying the live cell selection but I welcome other suggestions
Issues
I am sending an email to sort out this information, hopefully I will get a reply soon
@idazucchi Contacting the collaborators again
This is not on the high priority list, so while we wait it could be down prioritised
@idazucchi to chase one last time and close if no response
I emailed the authors again, if I don't get a reply in one month I'll close this ticket
we got a reply! I'll work on this dataset from next week, I'm trying to close the ones I have open at the moment
Open questions for review
mouse_droplet_TMS_UMIs.csv
is actually just a control dataset, it was not generated for this studygating
NA and CD45+ Epcam- Unclear what type of cell was selected. For NA I think no FACS was donemouse ss2 the cell ids from the metadata csv have been truncated: to match them up to the sequence file you need to discard the decimal digit for the plate well row
Start sec reviewing it.
Very well done Ida, on such an extensive dataset!
donor_organism.medical_history.smoking_history
but not exact number of cigarettes per day. Maybe we could just provide that patient 1 has remote history of smoking
while patients 2,3 are non-smokers
. The schema allows it but I am not sure about the consistency of our data (given that all other datasets have the guideline format).Genome version
for barcodes and possibly feature tables, could be Not applicable
.All file and biomaterial mappings are verified. Awesome work!
exported and filled the import form!
verified in the browser
Project short name:
HumanLungCellAtlas2020
Primary Wrangler:
Ami --> Ida
Secondary Wrangler:
Arsenios
Associated files
Ami's note: Response from authors about bulk data: "The immune cells for bulk mRNA sequencing were from an entirely different source (the blood was purchased from https://allcells.com/).". I have decided to not include this bulk data, as it is not from the same sample types as the single cells.
I have also decided to not include the mouse data from Tabula Muris Senis (authors also let us know that almost all the mouse data is from that datasets). I have created a separate project for Tabula Muirs Senis in ingest as the main publication is separate.
Link to Ingest:
Published study links
Paper: A molecular cell atlas of the human lung from single-cell RNA sequencing
Accessioned data: EGAS00001004344
Key Events