[GM only] MAGMA analysis

lcolladotor commented 2 years ago

MAGMA:

paper https://github.com/LieberInstitute/10xPilot_snRNAseq-human/search?q=magma
I believe that there was a major bug in version 1.0.7 that got fixed in 1.0.8 but I can't recall the relevant links T_T :P

Related scripts:

by Matt: https://github.com/LieberInstitute/10xPilot_snRNAseq-human/search?q=magma
by Andrew: https://github.com/LieberInstitute/HumanPilot/search?q=magma
[ ] Find out from Matt what is the right script we should try to adapt
[ ] Run MAGMA for the AD GWAS across the pathology labels
[ ] Adapt the plotting code from https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/bb42b5554d8b8b699617a52c7874cc032ac4539f/MAGMA_v1_08/10x-pilot_MAGMA_plot-gsa-stats_v1.08-UPDATE_MNT.R (we'll end up with a single column)
[ ] (optional) Run MAGMA with more GWASes

We might run into JHPCE permission issues, so we'll likely have to move some data to /dcs04 as part of doing this. It'll be good for me to know which paths you can't access. I know that I need to move /dcl02/lieber/ajaffe/SpatialTranscriptomics/HumanPilot which you'll need for https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/3dba9fe61891a142907360955752580ad1b2de00/MAGMA/magma-gsa_step3-geneSet-analysis_hg19-lifted_MNT.sh#L15.

We might need Nick's @Nick-Eagles help with making a JHPCE MAGMA module as well. I see that Matt used version 1.0.8 but there might be a new version.

lcolladotor commented 2 years ago

Sowmya will meet with Matt @mattntran and Louise @lahuuki about MAGMA soon.

lcolladotor commented 2 years ago

For MAGMA, we can use the enrichment p-values to find the top 100 per pathology group and see how it works https://github.com/LieberInstitute/spatialLIBD/blob/35ccde7e2cfa6bfcee4db0a6f54dde2a92bb5ca5/R/layer_stat_cor.R#L73-L83. For us, we should use p-values, not t-stats since the directionality doesn’t matter for us (vs the layer analysis we did before). So we want the genes with the 100 smallest p-values.

lcolladotor commented 2 years ago

sig_genes comes from https://github.com/LieberInstitute/Visium_IF_AD/blob/5e3518a9d379e90f593f5826cc24ec958f81f4aa/code/05_deploy_app_wholegenome/app.R#L39

Another option is to use p < 0.05 but then we would likely need to drop next_pT+ and pT+

> x <- subset(sig_genes, model_type == "enrichment")
master > tapply(x$pval, x$test, function(y) { sum(y < 0.05)})
      Ab+      both  next_Ab+ next_both  next_pT+      none       pT+ 
     1768        97      2927        56         3       108        32 
master > tapply(x$pval, x$test, function(y) { sum(y < 0.01)})
      Ab+      both  next_Ab+ next_both  next_pT+      none       pT+ 
      776        18      1585        10         0        17         6

By doing so, the number of genes will be closer to what Matt used before (the above numbers are higher than 100 in general):

> x <- read.delim("/dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/MAGMA/amyMarkerSets_fdr1e-6.txt")

> sort(table(x$Set))

Astro_B   Tcell Inhib_G    Endo   Mural Inhib_H Excit_B   Oligo   Micro Excit_C
     57     243     305     361     371     507     695     854     889    1189
Inhib_E Astro_A     OPC Inhib_F Inhib_C Inhib_D Inhib_B Inhib_A Excit_A
   1246    1332    1377    1563    1714    2280    2391    3196    3435

LieberInstitute / Visium_SPG_AD

[GM only] MAGMA analysis #38