MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

An open source and flexible pipeline to analysis high-throughput DNBelab C Series single-cell RNA datasets
MIT License
52 stars 21 forks source link

Output files #28

Closed m-fayer closed 9 months ago

m-fayer commented 9 months ago

Hi,

Our customer is asking if the following results are available in the output. If so, can you tell me in which files I can get the results. Thank you.

  1. Genes (transcripts) and their expression values across all cells - if this is not possible - then at least median (or average) expression value of the genes in each cluster of the cells.

  2. The list of DE genes for each cluster vs every other cluster. The DE info to include p-values (or FDR/adj pvalue) and fold change or log2 fold change.

  3. The number of captured cells, genes per cell info.

m-fayer commented 9 months ago

Thanks for @lishuangshuang0616 for help! The answers are as below.

  1. In the folder "output/filter_matrix", there are 3 files. "barcodes.tsv.gz" is cell ID list. "features.tsv.gz" is gene ID list. "matrix.mtx.gz" is expression values for genes in cells (3 columns: gene, cell, expression).

The files can be read by Seurat:

# https://satijalab.org/seurat/articles/pbmc3k_tutorial
library(Seurat)
data <- Read10X(data.dir = "output/filter_matrix",gene.column = 1)

Another file 'output/filter_feature.h5ad' can be read by Scanpy:

# https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html
import scanpy as sc
data = sc.read_h5ad('output/filter_feature.h5ad')
  1. The file "03.analysis/marker.csv" can be referred for this purpose.

Differentially expressed genes in each cell category are in this table. Each gene was tested for differential expression between each cluster and the rest of the samples.

For the columns of the table:

  1. Estimated number of cell can be found in file "output/metrics_summary.xls".

The file "03.analysis/raw_qc.xls" contains the number of genes (column "n_genes_by_counts") and the number of transcript UMIs (column "total_counts") in each cell. One can also refer to the answer 1 for this info.

The final report "output/xxx_scRNA_report.html" also include most of the above information with multiple plots.