Closed m-fayer closed 9 months ago
Thanks for @lishuangshuang0616 for help! The answers are as below.
output/filter_matrix
", there are 3 files. "barcodes.tsv.gz
" is cell ID list. "features.tsv.gz
" is gene ID list. "matrix.mtx.gz
" is expression values for genes in cells (3 columns: gene, cell, expression).The files can be read by Seurat:
# https://satijalab.org/seurat/articles/pbmc3k_tutorial
library(Seurat)
data <- Read10X(data.dir = "output/filter_matrix",gene.column = 1)
Another file 'output/filter_feature.h5ad
' can be read by Scanpy:
# https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html
import scanpy as sc
data = sc.read_h5ad('output/filter_feature.h5ad')
03.analysis/marker.csv
" can be referred for this purpose.Differentially expressed genes in each cell category are in this table. Each gene was tested for differential expression between each cluster and the rest of the samples.
For the columns of the table:
P-val
value is a measure of the statistical significance of expression differences, and the smaller the P-val value, the higher the similarity to theory.p_val_adj
is the adjusted p-value based on the bonferroni correction using all genes in the dataset.avg_log2FC
refers to the log value of the ratio of the expression of a gene in a cluster to the average expression in other cells.pct.1
is the proportion of cells that detect this gene expression in the current cluster cells.pct.2
is the proportion of cells that detect this gene expression in other cluster cells.output/metrics_summary.xls
".The file "03.analysis/raw_qc.xls
" contains the number of genes (column "n_genes_by_counts
") and the number of transcript UMIs (column "total_counts
") in each cell. One can also refer to the answer 1 for this info.
The final report "output/xxx_scRNA_report.html
" also include most of the above information with multiple plots.
Hi,
Our customer is asking if the following results are available in the output. If so, can you tell me in which files I can get the results. Thank you.
Genes (transcripts) and their expression values across all cells - if this is not possible - then at least median (or average) expression value of the genes in each cluster of the cells.
The list of DE genes for each cluster vs every other cluster. The DE info to include p-values (or FDR/adj pvalue) and fold change or log2 fold change.
The number of captured cells, genes per cell info.