Closed big-rain closed 2 weeks ago
1. data.feather
+ anno.feather
As you already know, the data.feather
(expression values) and anno.feather
(per-cell annotations, including cell type labels used in the paper) files were used to produce the .mat
file included in the repository. These files roughly correspond to the count data and metadata files available here. I had included notebooks only for reference, not intended to be run by users 😁
2. beta scores + gene selection
The .mat
file contains log-cpm normalized expression values for a pre-selected gene set. The set of genes was obtained using the beta score (a description of this score is can be found in the associated paper). I have now included beta scores assigned to each gene in genes_beta_score.csv
in case that is of interest. I expect any reasonable strategy to reduce the set of genes (e.g. highly variable genes, differentially expressed genes etc.) would roughly provide similar results, and the methodology with coupled autoencoders is agnostic to that.
3. other links
Check this closed issue and Gouwens et al. 2020 for less processed versions of the data. This Allen Institute page may also be of interest to you.
Can you provide working scRNA-seq data preprocessing code, from count matrices to .mat files?
Hi - please use any standard pipeline (e.g. through scanpy) to process the raw data in the data and metadata .csv
files linked above; you can add the exon and intron data to get a single counts matrix and then perform log-cpm normalization.
Hi, While reading the data processing code you gave me, I noticed that you used some preprocessed files, such as data.feather, anno.feather, good_fenes_beta_score.csv, specimen_ids.txt; color_ref.csv, However, from the scRNA-seq download address you gave, the above files were not found, only gene counting moment, bam, fastp files exist.
I would like to know how to obtain this data by analyzing raw scRNA-seq data from the above files. Can you provide more detailed code to help me analyze it?