czbiohub-sf / tabula-muris

Code and annotations for the Tabula Muris single-cell transcriptomic dataset.
https://www.nature.com/articles/s41586-018-0590-4
BSD 3-Clause "New" or "Revised" License
185 stars 90 forks source link

Question regarding Figshare data and AWS data #213

Closed hojaeklee closed 2 years ago

hojaeklee commented 5 years ago

Hello, I was wondering if someone may explain to me the differences in the FigShare data (I downloaded via data_download.sh) and AWS data?

From looking at annotation_FACS.csv and TM_facs_metadata.csv, there seems to be a difference in the total number of cells and the categories for cell_ontology_classes.

Perhaps someone could kindly point me to how these were annotated? Thank you very much. :)

aopisco commented 5 years ago

@olgabot can you help?

olgabot commented 5 years ago

Hi @hojaeklee, the raw AWS data includes ALL cells that may not have been filtered out.

(base)
 ♥ 71%  Wed 18 Sep - 01:45  ~/Downloads 
  wc -l TM_facs_metadata.csv
   53761 TM_facs_metadata.csv
(base)
 ♥ 71%  Wed 18 Sep - 01:45  ~/Downloads 
  wc -l annotations_facs.csv
   44950 annotations_facs.csv

So many cells in TM_facs_metadata.csv have NA in the column cell_ontology_class because they did not pass filters. annotations_FACS.csv were only from the cells with at least 500 genes and 50k reads, as in here per tissue, and all tissue annotations were combined here. Hope that helps!

Thank you for your patience and sorry for the delay!

ayshwaryas commented 4 years ago

Hi @olgabot

We would like to access the plate data fastqs for the cells that did not pass filters. Where should we look? SRA?

Here are the locations we explored for the 3-months data on AWS:

1) FACS metadata folder: Taking kidney as an example, there are 865 cells but there is no location for the fastqs.

data=fread("./Downloads/tabula-muris-senis-facs-official-raw-obj__cell-metadata.csv")
data%>%subset(tissue=="Kidney")%>%subset(age=="3m")%>%summarize(n())
  n()
865

This number also fits with the Kidney-counts.csv file in the FACS.zip folder on figshare.

2) Plate-seq folder: In the fastqs annotation file (in the fastq folder), there are only 519 cells. There are fastqs for these.

data=fread("./Downloads/fastqs_annotated.csv")
data%>%subset(tissue=="Kidney")%>%summarize(n())
  n()
519

3) Data objects folder: the tabula-muris-senis-facs-official-raw-obj.h5ad file has 502 cells. These cells passed a filter of 500 genes.

We would like the fastq files for the (865-519) cells. And for all tissues and times. Many Thanks!

aopisco commented 4 years ago

@ayshwaryas all the fastqs are available from Tabula Muris Senis S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/.

The 3m (Tabula Muris) files are also available from the Tabula Muris S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris/

ayshwaryas commented 4 years ago

Thanks @aopisco

I did look there and my note is based on the fastqs_annotated.csv in the Tabula Muris Senis S3 bucket (https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/Plate_seq/3_month/?region=us-west-2&tab=overview)

Based on the numbers I posted, it seems the annotation file is not uptodate or only has filtered cells? Could you please help disambiguate? Is there another annotation file? Thanks!

donshiva88 commented 3 years ago

Do you provide a metadata file like the one requested by ayshwaryas? I am in need of a droplet metadata file for this file: tabula-muris-senis-bbknn-processed-official-annotations.h5ad So metadata for all 356.213 cells. Can someone provide?

aopisco commented 3 years ago

@donshiva88 that object includes the metadata

aopisco commented 3 years ago

@ayshwaryas we only have annotation file for the good quality cells