Open melop opened 1 week ago
Hi, we're sorry that there is no such parameter at present.. Because highly variable genes(3000 by default) will be selected before differential analysis, if you want to obtain information about all genes, you could read the h5ad file in the directory after diffExp analysis. For example, uns['DE_1_marker_genes']['mean_count'] in the h5ad stores the average expression of all genes in different regions.
3000 appears to be somewhat arbitrary? If I have used the lasso tool to delineate 5 regions, how to obtain the gene counts for each region. In particular, is there a documentation on the data structure of the h5ad file? Thanks!
Hi, we're sorry that there is no such parameter at present.. Because highly variable genes(3000 by default) will be selected before differential analysis, if you want to obtain information about all genes, you could read the h5ad file in the directory after diffExp analysis. For example, uns['DE_1_marker_genes']['mean_count'] in the h5ad stores the average expression of all genes in different regions.
Hi, Here is the code to get the gene count for each region
import scanpy as sc
data=sc.read_h5ad('area_diffexp/SN.bin100_1.0.h5ad')
data.uns['DE_1_marker_genes']['mean_count']
It will output:
Hi, 3000 is just the default value in SAW. According to some spatial omics analysis software such as scanpy and actual test data, 3000 highly variable genes could be used for downstream feature analysis and reduce computing pressure. At the same time, if the user want to adjust the analysis parameters, we encourage user to use Stereopy for personalized analysis. https://stereopy.readthedocs.io/en/latest/index.html
If you want to obtain the gene expression levels in different regions, you could export lasso.geojson in StereoMap, and then use saw reanalyze lasso
to obtain the expression matrix in each regions.
h5ad is the Anndata format, which records the gene expression, spatial coordinates, clustering results and other information of all spots. You could refer to the following URL https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.html
Ok thanks!
A closer inspection reveals a more severe issue with the reanalyze diffExp function. Seurat analysis revealed a highly significant differential expression of a gene (FDR = 2.636392e-150) in the target tissue region. Exporting the mean counts from the h5ad file also clearly showed an upregulation of this gene:
AKA021889 0.659302325581397 3.7833836858005907 0.2049941927990706 0.4573110893032378 0.18079096045197815 0.5580693815987918 0.5719725818735707
Inspecting this gene in Stereomap also clearly showed a clear upregulation in the target region.
However, this gene is completely missing from the *.find_marker_genes.csv file.
This suggests that the SAW reanalyze diffExp pipeline may contain a serious bug, which would result in omitting significant results. I suggest submitting this bug for fixing.
@melop Would you mind providing the tissue.gef and geojson files for us to troubleshoot?
I sent these files to your email, please have a look, thanks!
Hi, the SAW pipeline defaults to differential analysis based on 3,000 highly variable genes. These genes are not highly variable genes and therefore are not included in the differential expression gene analysis. This is a shortcoming of the SAW pipeline. We will discuss and correct it as soon as possible. Users could use Stereopy to adjust the parameters for DEG analysis. Thank you for your suggestion and sorry for the inconvenience caused to you.
Hello, after running saw reanalyze diffExp, I only saw 3000 genes in the output file *.find_marker_genes.csv. Is there a way to make it output all genes even if no differential expression is found? This would be important for some downstream analyses like GO enrichment etc. Thanks!