LieberInstitute / Visium_SPG_AD

Visium SPG AD project (n = 10) using Visium Spatial Proteogenomics (Visium-SPG) on dissections from the inferior temporal cortex (ITC) from Alzheimer's disease cases and controls.
https://research.libd.org/Visium_SPG_AD
3 stars 0 forks source link

[GM only] MAGMA analysis #38

Closed lcolladotor closed 1 year ago

lcolladotor commented 2 years ago

MAGMA:

Related scripts:

We might run into JHPCE permission issues, so we'll likely have to move some data to /dcs04 as part of doing this. It'll be good for me to know which paths you can't access. I know that I need to move /dcl02/lieber/ajaffe/SpatialTranscriptomics/HumanPilot which you'll need for https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/3dba9fe61891a142907360955752580ad1b2de00/MAGMA/magma-gsa_step3-geneSet-analysis_hg19-lifted_MNT.sh#L15.

We might need Nick's @Nick-Eagles help with making a JHPCE MAGMA module as well. I see that Matt used version 1.0.8 but there might be a new version.

lcolladotor commented 2 years ago

Sowmya will meet with Matt @mattntran and Louise @lahuuki about MAGMA soon.

lcolladotor commented 2 years ago

For MAGMA, we can use the enrichment p-values to find the top 100 per pathology group and see how it works https://github.com/LieberInstitute/spatialLIBD/blob/35ccde7e2cfa6bfcee4db0a6f54dde2a92bb5ca5/R/layer_stat_cor.R#L73-L83. For us, we should use p-values, not t-stats since the directionality doesn’t matter for us (vs the layer analysis we did before). So we want the genes with the 100 smallest p-values.

lcolladotor commented 2 years ago

sig_genes comes from https://github.com/LieberInstitute/Visium_IF_AD/blob/5e3518a9d379e90f593f5826cc24ec958f81f4aa/code/05_deploy_app_wholegenome/app.R#L39

Another option is to use p < 0.05 but then we would likely need to drop next_pT+ and pT+

image

> x <- subset(sig_genes, model_type == "enrichment")
master > tapply(x$pval, x$test, function(y) { sum(y < 0.05)})
      Ab+      both  next_Ab+ next_both  next_pT+      none       pT+ 
     1768        97      2927        56         3       108        32 
master > tapply(x$pval, x$test, function(y) { sum(y < 0.01)})
      Ab+      both  next_Ab+ next_both  next_pT+      none       pT+ 
      776        18      1585        10         0        17         6

By doing so, the number of genes will be closer to what Matt used before (the above numbers are higher than 100 in general):

> x <- read.delim("/dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/MAGMA/amyMarkerSets_fdr1e-6.txt")

> sort(table(x$Set))

Astro_B   Tcell Inhib_G    Endo   Mural Inhib_H Excit_B   Oligo   Micro Excit_C
     57     243     305     361     371     507     695     854     889    1189
Inhib_E Astro_A     OPC Inhib_F Inhib_C Inhib_D Inhib_B Inhib_A Excit_A
   1246    1332    1377    1563    1714    2280    2391    3196    3435