Closed lcolladotor closed 3 years ago
uploading the image for posterity
I intend on using this hyperlinked guide to making a Manhattan plot for these data, before I figure out how to modify the y-axis to include negative values. In order to do that, I need to compute the "the cumulative position of SNP."
First of all, we need to compute the cumulative position of SNP.
don <- gwasResults %>%
group_by(CHR) %>% summarise(chr_len=max(BP)) %>%
mutate(tot=cumsum(chr_len)-chr_len) %>% select(-chr_len) %>%
left_join(gwasResults, ., by=c("CHR"="CHR")) %>%
arrange(CHR, BP) %>% mutate( BPcum=BP+tot)
However, base-pair positions are not included with the output of `read_twas.R`, `twas_exp.Rdata`. So, I need to modify `read_twas.R` in order to retrieve and include the positions of each gene.
`rowRanges()` gives me a GRanges object with the base-pair location of each gene in the `rse` object:
rowRanges(rse) GRanges object with 25136 ranges and 10 metadata columns: seqnames ranges strand | Length gencodeID
| ENSG00000227232.5 chr1 14404-29570 - | 1351 ENSG00000227232.5 ENSG00000278267.1 chr1 17369-17436 - | 68 ENSG00000278267.1 ENSG00000269981.1 chr1 137682-137965 - | 284 ENSG00000269981.1 ENSG00000279457.3 chr1 184923-200322 - | 1982 ENSG00000279457.3 ENSG00000228463.9 chr1 257864-297502 - | 4039 ENSG00000228463.9 ... ... ... ... . ... ... ENSG00000198695.2 chrM 14149-14673 - | 525 ENSG00000198695.2 ENSG00000210194.1 chrM 14674-14742 - | 69 ENSG00000210194.1 ENSG00000198727.2 chrM 14747-15887 + | 1141 ENSG00000198727.2 ENSG00000210195.2 chrM 15888-15953 + | 66 ENSG00000210195.2 ENSG00000210196.2 chrM 15956-16023 - | 68 ENSG00000210196.2 ```
My plan is to extract IRanges
for each gene, calculate the average to find the rough "midpoint" for each gene present in the TWAS data, and then include that with the output.
Of course, I could be overthinking this all and "doing too much," but it does seem clear to me that I need the base-pair position of each gene expressed along the transcriptome as in the figure you pointed me to in the first post.
I was able to find Dr. Gusev's code as an alternative for creating a "mirrored" Manhattan plot here, but after cloning it and examining the code, it seemed even more difficult to approach than what I already had.
I also explored other options, such as this package for making mirrored Manhattan plots, or this one (with an example here), but I figured I'd touch base with you. Let me know what you think.
Update: I was successfully able to generate the plot. I'm now working on the plotly html
that allows the user to pinpoint and view each gene, but I am having trouble with this.
I have successfully generated an interactive Manhattan plot for our TWAS results using plotly
. We can change it here and there and make modifications but the plot itself is finished. Interactive plot is accessible here: /dcl01/lieber/ajaffe/lab/zandiHyde_bipolar_rnaseq/dev_twas/BIP_TWAS_Plotly.html
.
Plots.zip
From Leo:
Can you separate it into 2, one per region. Also, can you add a horizontal line (it’ll be 2, one positive and one negative) to show the Bonf p-value < 0.05? Thanks!
Can you also add the Gene symbol and a search box? For the gene symbol, that just means adding it as a column to the ggplot2 data.frame you are using. For the search box, that will involve some of the options from https://gist.github.com/lcolladotor/350ac8843e153d135f64880a1d029b14#file-2020-09-11_plotly-r-L88-L95
The TWAS Z score Manhattan plot PDF as well as the interactive (with gene symbol search box) plot is now available in dev_twas/
. I am not going to close this issue yet, because there still remains the problem of adding the Bonferroni P-value threshold https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/issues/3#issuecomment-718159960. I'm having trouble understanding how to apply the Bonferroni-corrected P value threshold on a Manhattan plot for TWAS Z scores, but maybe after a meeting we could clear it up. For now, I'm going to move onto generating the other plots.
So in the past we used p.adjust()
to compute the Bonferroni-adjusted p-values for each region / feature at a time (see https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/explore_twas_psycm.R#L149). In this case, we have only gene-level results and 2 brain regions, so we need to compute 2 Bonferroni thresholds.
Now, for the plot, since it's showing Z-scores, we need to go from a Bonferroni adjusted p-value back to Z values. Our test is two sided, so we divide our initial alpha (0.05) by 2. Then we can use the qnorm()
function to find the corresponding Z value that we need to set a horizontal line at (both + Z and -Z).
> qnorm(1 - (0.05 / 2))
[1] 1.959964
Now that we know that, we need to find the number of tests. So I initially loaded your data following https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/dev_twas/generate_twas_plots.R#L1-L12. I see tht we have about 15 to 16k TWAS results by brain region.
> table(twas_exp_fin$region)
amygdala sacc
15278 16331
So the corresponding Z value we need for each can be calculated from.
> qnorm(1 - 0.025 / table(twas_exp_fin$region))
amygdala sacc
4.652920 4.666639
However, I then noticed the NAs.
> summary(twas_exp_fin$TWAS.Z)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-5.860 -0.846 -0.011 -0.015 0.830 6.221 12634
> summary(twas_exp_fin$TWAS.P)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 0.149 0.401 0.429 0.692 1.000 12634
which I see that you removed yourself with https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/dev_twas/generate_twas_plots.R#L12 and we had done something similar in BrainSeq Phase II at https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/explore_twas_psycm.R#L17-L18.
>
> summary(twas_z$TWAS.P)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.1490 0.4010 0.4292 0.6920 1.0000
> summary(twas_z$TWAS.Z)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-5.86030 -0.84645 -0.01097 -0.01538 0.83035 6.22090
So, after removing the NA TWAS results, we have about 9 to 10k results per brain region. So then, the Z values you need for your plots are:
> table(twas_z$region)
amygdala sacc
8975 10000
> qnorm(1 - 0.025 / table(twas_z$region))
amygdala sacc
4.542048 4.564788
Done. I've added the Bonferroni-corrected Z-score threshold and generated the resulting plots. I will post all plots to the Slack channel but they are available in the dev_twas/analysis/
path directory.
Hi Arta,
Once you are done with #2, using the gene-level TWAS results (all of them, not just the significant ones), can you make a plot like Figure 4 from https://science.sciencemag.org/content/362/6420/eaat8127? We don't have code for this, but we'll likely need to use
ggplot2
andggrepel
to make the labels. We need two plots here. One for the gene-level TWAS results for sACC and a second one for Amygdala.Make this into a single script, and potentially into a function that we can pass to it the gene-level sACC data or the gene-level Amygdala data. By having that function, we could potentially re-use it for the exon, exon-exon junction or transcript-level results if we need to later.
Thanks, Leo