AllonKleinLab / paper-data

17 stars 7 forks source link

Dowloading gene expressions from each cell #14

Closed francescabasini closed 10 months ago

francescabasini commented 1 year ago

Apologies if this might seem a silly question but I have been having trouble trying to find normalised gene expressions for each single cell for genes annotated as variable for the in vitro dataset. Could you please indicate me a way to download them?

Many thanks

calebweinreb commented 1 year ago

Hi,

Can you explain a little more what you are looking for? I think the gene expression files here are already total-counts normalized.

francescabasini commented 1 year ago

Thanks a lot for your reply! Essentially, once I download the "stateFate_inVitro_normed_counts" and opened it, I can see a txt file with 3 columns and I am just trying to make sense of them.

Is it correct to assume that the first column indicates a gene, the second the cell it belongs to and the third is indeed the normalized count for that gene expression?

If that is the case, how can I associate each cell to the timepoint it's been profiled at? is it correct to assume that file "stateFate_inVitro_metadata.txt.gz" which has 130888 rows summarises all info for the normalised counts data?

photo_5827889558744841640_x

Many thanks again

calebweinreb commented 1 year ago

Hi,

Yes that's right! (except first column is cells and second column is genes)

suyanxun commented 10 months ago

Hi, there are 25289 columns in the "stateFate_inVitro_normed_counts.mtx.gz" but only 25288 genes in the stateFate_inVitro_gene_names.txt.gz, is it correct?

And it seems that "stateFate_inVitro_gene_names.txt.gz", "stateFate_inVivo_gene_names.txt.gz" and "stateFate_cytokinePerturbation_gene_names.txt.gz" are the same files, is it correct?

calebweinreb commented 10 months ago

There are 25289 gene names in stateFate_inVitro_gene_names.txt.gz. Sometimes gene lists end with a newline character whereas this one doesnt, so make sure you read the final line.

And yes it's correct that all the datasets share the same set of gene names.