Closed aseyedia closed 3 years ago
Leo - we discussed the possibility of Arta running the following pathway enrichment tests:
- GO enrichment tests of TWAS significant genes (FDR<5%) for sACC
- GO enrichment tests of TWAS significant genes (FDR<5%) for AMYG
- Go enrichment tests of TWAS significant genes (FDR<5%) for union of two brain regions
Arta, for each of the 3 sets of genes that Peter mentioned you need to:
rse_gene
object in the next step (well, rowData(rse_gene)$EntrezID
).rse_gene
objects like at https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/wgcna/analyze_results.R#L65-L70clusterProfiler
and save the Rdata file similar to https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/wgcna/analyze_results.R#L72-L75as.data.frame()
+ write.csv()
like in https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/wgcna/analyze_results.R#L81. Actually, order them by q-value (FDR-adjusted p-value) and do the trick for Excel as in https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/65a0ef6ee48462e1417155fae11dffae9bede13d/case_control/run_gene_set.R#L96-L99. If we want to make a plot, we could also make a heatplot()
as in http://research.libd.org/SPEAQeasy-example/bootcamp_intro#53_GO_enrichment_analyses though the 3 table files (or a single Excel file with 3 sheets) is what Peter is looking for now.
As of right now, I've completed the first three steps and I'm running compareCluster
as a job submission as it appears to be a computationally intense step and therefore not feasible on the command line. I'll check back in a little bit later, maybe on the weekend even, to see if it ran correctly. If so, hopefully the remaining steps will be easy and we'll have something ready by next Tuesday. You can take a look at enrichment_test.R
if you'd like, but I found it to be pretty simple.
**** Job starts ****
Fri Dec 4 14:35:16 EST 2020
...
Error in compareCluster(twas_z_both_fdr$fdr, univ = twas_z_both$fdr, OrgDb = "org.Hs.eg.db", :
No enrichment found in any of gene cluster, please check your input...
Execution halted
**** Job ends ****
Fri Dec 4 16:31:39 EST 2020
Above is the error generated last Friday by the enrichment test script. Not only does compareCluster
take two hours to run, it also apparently does not work with the data I've supplied it.
Upon further inspection, this is due to the fact that I input a column that does not exist in any of the tables for all three compareCluster
commands. In trying to run on the FDR-adjusted P-values, I called dt$fdr
, when I should have called dt$fdr.p
.
twas_z_both$fdr.p <- p.adjust(twas_z_both$TWAS.P, 'fdr')
twas_z_amyg$fdr.p <- p.adjust(twas_z_amyg$TWAS.P, 'fdr')
twas_z_sacc$fdr.p <- p.adjust(twas_z_sacc$TWAS.P, 'fdr')
twas_z_both_fdr <- twas_z_both[fdr.p < 0.05,]
twas_z_amyg_fdr <- twas_z_amyg[fdr.p < 0.05,]
twas_z_sacc_fdr <- twas_z_sacc[fdr.p < 0.05,]
...
go_both <- compareCluster(twas_z_both_fdr$fdr.p, univ = twas_z_both$fdr.p,
OrgDb = "org.Hs.eg.db", ont = "ALL",
readable = TRUE, pvalueCutoff = 1, qvalueCutoff = 1)
go_amyg <- compareCluster(twas_z_amyg_fdr$fdr.p, univ = twas_z_amyg$fdr.p,
OrgDb = "org.Hs.eg.db", ont = "ALL",
readable = TRUE, pvalueCutoff = 1, qvalueCutoff = 1)
go_sacc <- compareCluster(twas_z_sacc_fdr$fdr.p, univ = twas_z_sacc$fdr.p,
OrgDb = "org.Hs.eg.db", ont = "ALL",
readable = TRUE, pvalueCutoff = 1, qvalueCutoff = 1)
It's a simple mistake. I'm going to rerun it now and see if it produces anything differently.
Since my last post, I have also discovered that I am supposed to be inputting Entrez ID values into compareCluster
and not p-values.
go_both <- compareCluster(twas_z_both_fdr$EntrezID, univ = twas_z_both$EntrezID,
OrgDb = "org.Hs.eg.db", ont = "ALL",
readable = TRUE, pvalueCutoff = 1, qvalueCutoff = 1)
go_amyg <- compareCluster(twas_z_amyg_fdr$EntrezID, univ = twas_z_amyg$EntrezID,
OrgDb = "org.Hs.eg.db", ont = "ALL",
readable = TRUE, pvalueCutoff = 1, qvalueCutoff = 1)
go_sacc <- compareCluster(twas_z_sacc_fdr$EntrezID, univ = twas_z_sacc$EntrezID,
OrgDb = "org.Hs.eg.db", ont = "ALL",
readable = TRUE, pvalueCutoff = 1, qvalueCutoff = 1)
Unfortunately, I am still receiving an error, but now it's a different type of error:
The following `from` values were not present in `x`: .id
Error in `$<-.data.frame`(`*tmp*`, "Cluster", value = integer(0)) :
replacement has 0 rows, data has 6128
Calls: compareCluster -> $<- -> $<-.data.frame
Execution halted
**** Job ends ****
Mon Dec 7 16:20:33 EST 2020
It's obvious that I am doing something wrong, so I'm going to focus on going through the documentation carefully and setting up the package that way. There are probably steps that I am neglecting to do.
It just occurred to me that you were perhaps giving me an example in your original post about using clusterProfiler
and not suggesting that I adapt specifically that function, compareCluster
. I'm going to continue to read the documentation more thoroughly but now with consideration to other functions that would better suit our particular circumstance.
I'm right now working with the enrichGO
function and this Bioconductor support post is informative re: the generation of both "FDR-adjusted p-values" and "q-values" by this package.
It appears as though using enrichGO
instead of compareCluster
will work, according to the test subset that I used:
> test_twas_z_both_fdr <- twas_z_both_fdr[1:100]
> test_twas_z_both <- twas_z_both[1:100]
> go_test_both <- enrichGO(gene = test_twas_z_both_fdr$EntrezID, OrgDb = "org.Hs.eg.db",
+ keyType = "ENTREZID", ont = "ALL", pvalueCutoff = 1,
+ pAdjustMethod = "fdr", universe = test_twas_z_both$EntrezID,
+ qvalueCutoff = 1)
> go_test_both
#
# over-representation test
#
#...@organism Homo sapiens
#...@ontology GOALL
#...@keytype ENTREZID
#...@gene chr [1:100] "7812" "2011" "9671" "23259" "55741" "22803" "51291" "4069" ...
#...pvalues adjusted by 'fdr' with cutoff <1
#...1921 enriched terms found
'data.frame': 1921 obs. of 10 variables:
$ ONTOLOGY : chr "BP" "BP" "BP" "BP" ...
$ ID : chr "GO:0030203" "GO:1904526" "GO:0006022" "GO:0006024" ...
$ Description: chr "glycosaminoglycan metabolic process" "regulation of microtubule binding" "aminoglycan metabolic process" "glycosaminoglycan biosynthetic process" ...
$ GeneRatio : chr "5/80" "2/80" "5/80" "4/80" ...
$ BgRatio : chr "158/18866" "10/18866" "172/18866" "108/18866" ...
$ pvalue : num 0.000561 0.000782 0.000823 0.00115 0.001344 ...
$ p.adjust : num 0.365 0.365 0.365 0.365 0.365 ...
$ qvalue : num 0.358 0.358 0.358 0.358 0.358 ...
$ geneID : chr "SLC10A7/LYG1/SDC2/B4GAT1/CHST1" "MARK2/GAS8" "SLC10A7/LYG1/SDC2/B4GAT1/CHST1" "SLC10A7/SDC2/B4GAT1/CHST1" ...
$ Count : int 5 2 5 4 2 4 3 3 2 2 ...
#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology
2012, 16(5):284-287
This is good news. I've saved my changes and am now running this script on the cluster. Will report back with results.
Running enrichGO
was a success.
Attached is the final enrichment test that I successfully ran yesterday. With it comes 3 sheets, one for each enrichment test. I also converted the geneID
to their corresponding gene symbols. I will email this to Peter to see what he thinks. Since he didn't ask us for a heat map or anything beyond just producing the table(s), I will close this issue for now, but will re-open it if there's more work to do.
Cross-posting @lcolladotor 's email from email thread (omitting the part about union vs intersection in twas_z_both
):
Hi Arta,
Can you calculate the TWAS FDR for each brain region using all the genes that have a non-NA TWAS p-value? That is do https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/dev_twas/enrichment_test.R#L69-L73 before dropping the genes without Entrez IDs https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/dev_twas/enrichment_test.R#L61-L64.
Once you do that, you can do:
with(twas_z_sacc, addmargins(table(is.na(EntrezID), fdr.p < 0.05)))
with(twas_z_amyg, addmargins(table(is.na(EntrezID), fdr.p < 0.05)))
Thanks
# with(twas_z_sacc, addmargins(table(is.na(EntrezID), fdr.p < 0.05, deparse.level = 2)))
fdr.p < 0.05
is.na(EntrezID) FALSE TRUE Sum
FALSE 7122 124 7246
TRUE 2722 32 2754
Sum 9844 156 10000
# with(twas_z_amyg, addmargins(table(is.na(EntrezID), fdr.p < 0.05, deparse.level = 2)))
fdr.p < 0.05
is.na(EntrezID) FALSE TRUE Sum
FALSE 6343 94 6437
TRUE 2507 31 2538
Sum 8850 125 8975
Running the script on JHPSCE now.
Hm... I don't think that this matters, but well, in https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/dev_twas/enrichment_test.R#L91-L94 we want to use unique(twas_z_both_fdr$EntrezID)
instead of twas_z_both_fdr$EntrezID
for the genes of interest and unique(twas_z_both$EntrezID)
for the universe. I don't think that it matters, because maybe enrichGO does the unique()
step inside also. Can you check this? Thanks! Like, I wouldn't want the GO results to be weight weirdly genes that show up twice in the set of genes of interest and/or the universe background.
BD_EnrichmentTest.xlsx The script was successful, although {xlsx} once again gave me this bizarre warning:
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.poi.util.SAXHelper (file:/jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0.x/R/4.0.x/lib64/R/site-library/xlsxjars/java/poi-ooxml-3.10.1-20140818.jar) to method com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int)
WARNING: Please consider reporting this to the maintainers of org.apache.poi.util.SAXHelper
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Does not bode well for reproducibility. Regardless, the table was generated and I will be sending it to Peter.
From Peter:
Thanks, Arta! Unfortunately, the results didn't change all that much and the few pathways that come up as q<0.05 are not ones we would think are relevant.... hmm... Anyway, appreciate you running them again, pz
Peter mentioned that he wanted us to do enrichment tests on the FDR significant genes for each subregion and for both. He also mentioned that Leo has code that I could adapt to this end. I'm opening this issue as a place holder until I meet with Leo and/or Peter reaches out via email and and gives more information (also re: TWAS up/downregulation).