Output files - Githubissues

Thanks for @lishuangshuang0616 for help! The answers are as below.

In the folder "output/filter_matrix", there are 3 files. "barcodes.tsv.gz" is cell ID list. "features.tsv.gz" is gene ID list. "matrix.mtx.gz" is expression values for genes in cells (3 columns: gene, cell, expression).

The files can be read by Seurat:

# https://satijalab.org/seurat/articles/pbmc3k_tutorial
library(Seurat)
data <- Read10X(data.dir = "output/filter_matrix",gene.column = 1)

Another file 'output/filter_feature.h5ad' can be read by Scanpy:

# https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html
import scanpy as sc
data = sc.read_h5ad('output/filter_feature.h5ad')

The file "03.analysis/marker.csv" can be referred for this purpose.

Differentially expressed genes in each cell category are in this table. Each gene was tested for differential expression between each cluster and the rest of the samples.

For the columns of the table:

The P-val value is a measure of the statistical significance of expression differences, and the smaller the P-val value, the higher the similarity to theory.
p_val_adj is the adjusted p-value based on the bonferroni correction using all genes in the dataset.
avg_log2FC refers to the log value of the ratio of the expression of a gene in a cluster to the average expression in other cells.
pct.1 is the proportion of cells that detect this gene expression in the current cluster cells.
pct.2 is the proportion of cells that detect this gene expression in other cluster cells.

Estimated number of cell can be found in file "output/metrics_summary.xls".

The file "03.analysis/raw_qc.xls" contains the number of genes (column "n_genes_by_counts") and the number of transcript UMIs (column "total_counts") in each cell. One can also refer to the answer 1 for this info.

The final report "output/xxx_scRNA_report.html" also include most of the above information with multiple plots.

MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

Output files #28