Closed lcolladotor closed 4 years ago
Attaching the figures for future reference
We need to use the same colors for the regions as the ones Emily used for the rest of the BPD project.
Where can I find examples of these colors? So I know for the Manhattan plots as well
Hm....
Those are colors Peter used in one figure. Though it might be best to double check with him. Emily had used other colors at https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/blob/master/case_control/venn_qSVAjoint_gene.png
Having a bit of trouble understanding the ggplot code here, specifically lines 328-329 and lines 337-338:
ggplot(subset(region_twas_z, feature == 'gene'),
aes(x = DLPFC, y = HIPPO, color = FDR.5perc, shape = in_both))
region_twas_z
was generated using a custom function called get_variable_by_region
on line 262, defined here , wherein the argument TWAS.Z
was input .
I'm having trouble understanding what this function does with this column without having the data and going through the function myself. I see that only two lines actually use the var
input on lines 43 & 44, but it seems a bit beyond me what that means.
Of course, I could also try to recreate this plot on my own using the data I have available, which might be the best course of action, but seeing that we have a meeting together tomorrow, I will convene with you and see what you think. Regardless, I don't think figuring out how to adapt this code would be difficult, but it's not as simple as change the names of the arguments in ggplot.
I find myself to be a bit lost. Since I'm not familiar with the source code, I'm not sure what the input arguments to the ggplot are supposed to be. Here's what I'm working with:
fdr.z <- fdrtool(twas_z$TWAS.Z)
ggplot(twas_z,
aes(
x = twas_z[region == "amygdala",]$TWAS.Z,
y = twas_z[region == "sacc"]$TWAS.Z,
color = fdr.z$pval,
shape = in_both
)) +
geom_point() +
# facet_grid(BEST.GWAS.status ~ feature) +
coord_fixed() +
theme_bw(base_size = 30) +
ggtitle('TWAS Z by brain region') +
scale_color_manual(values = c('grey80', 'dark orange', 'skyblue3', 'purple'))
I'm using fdrtool
because I'm not totally familiar with how FDR.5perc
was generated in the source code, so I went with this package that seems to provide FDR-corrected values for Z-scores.
I'm also not sure about the nature of the aes()
arguments. As it stands right now, I am getting the following error whenever I try to run this ggplot
:
Error: Aesthetics must be either length 1 or the same as the data (18975): x and y
Which, to me, means that both arguments need to be 18975 values in length.
When I tried to test this ggplot
using 18975 random values for each x
and y
argument:
x_test <- runif(18975, min = -3, max = 3)
y_test <- runif(18975, min = -3, max = 3)
I get this:
Error: Continuous value supplied to discrete scale
Which I think has something to do with the color
argument in aes()
:
aes(
x = twas_z[region == "amygdala",]$TWAS.Z,
y = twas_z[region == "sacc"]$TWAS.Z,
color = fdr.z$pval,
shape = in_both
)
One way I could figure out how to solve this issue is to get the source data that was used in the script from which I am adapting this ggplot, but that does not seem like the most efficient way to approach this problem.
FDR.5perc
is a factor with 4 levels that gets calculated at https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/twas_functions.R#L55-L58 and https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/twas_functions.R#L75 which says whether both brain regions (DLPFC and HIPPO in the BrainSeq Phase II case, sACC and Amygdala here) are have TWAS FDR-adjusted p-values less than 0.05 (both, DLPFC only, HIPPO only, or none of them).
The above function is used at https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/explore_twas_psycm.R#L262 for the psycm GWAS. I made it into a function because I used it also for exploring the results from a second GWAS. It's likely that we'll need to tweak that function for the sACC and Amygdala data you are working on.
The FDR-adjusted p-values are calculated using p.adjust()
as in https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/explore_twas_psycm.R#L148. I wouldn't use a different package (fdrtool
) for that, to keep it consistent with what we've done. I'm not familiar with fdrtool
, so we could look into it later in an R club session, but let's avoid using it for now.
With @lahuuki's help, I was able to generate the following:
Doesn't look exactly like what you've linked but it's a good start. I'm going to push the changes I've made to JHPCE and we can meet next Monday.
The plot above is colored according to quadrant and not according to significance, and is not plotting fdr-corrected values.
After our meeting this morning, we came up with the following actions for the above plot:
Here's an update. Turns out I was supposed to be using FDR-corrected P-values and not Z-scores. Now I have to figure out how to get in_both
back.
Here we go.
Nice, it looks better! Though the colors in the cross need some tweaks.
Good job Arta ^^
Fixed it after it became apparent during the Moods meeting that the FDR.5perc
column was inaccurately categorizing based on raw P values and not on FDR-corrected P values.
Nice, thanks Arta! This looks good =)
Thanks Leo.
Posting relevant code for the second plot here:
tt_sigonly <- tt[tt$TWAS.FDR < 0.05, ]
with(tt_sigonly, addmargins(table(BEST.GWAS.status, EQTL.status, useNA = 'ifany')))
# EQTL.status
# BEST.GWAS.status Index Other Proxy Sum
# Index 2 660 113 775
# Other 0 6912 0 6912
# Proxy 0 1787 367 2154
# Sum 2 9359 480 9841
tt_sigonly_bonf <- tt[tt$TWAS.Bonf < 0.05, ]
with(tt_sigonly_bonf, addmargins(table(BEST.GWAS.status, EQTL.status, useNA = 'ifany')))
# EQTL.status
# BEST.GWAS.status Index Other Proxy Sum
# Index 2 252 60 314
# Other 0 230 0 230
# Proxy 0 517 285 802
# Sum 2 999 345 1346
create_gwas_or_eqtl(tt_sigonly, 'pdf/twas_fdr5perc_vs_gwas_or_eqtl.pdf', 'FDR')
@lcolladotor, I am able to generate a plot using the code you supplied to me for the second scatterplot that is supposed to resemble Figure S47, but I am a bit lost as to what exactly you want me to plot, because the figure you showed me as an example and the code you sent me to be adapted are a bit different. Here's what I have:
twas_z_scatter <-
select(twas_z,
geneid,
genesymbol,
TWAS.Z,
TWAS.P,
EQTL.Z,
BEST.GWAS.Z,
region) %>%
as.data.table()
twas_z_scatter$region <-
factor(twas_z_scatter$region,
levels = c('amygdala', 'sacc'))
print(
ggplot(twas_z_scatter,
aes(
x = TWAS.Z,
y = EQTL.Z,
color = pnorm(BEST.GWAS.Z) < 5e-08
)) + geom_point()
+ labs("") + facet_grid( ~ region) +
theme_bw(base_size = 30) +
scale_color_discrete(name = "Best\nGWAS\nP-value") +
labs(caption = 'Risk Loci by EQTL')
)
Does not quite look similar to the plots you showed me so I'll wait until lab meeting to ask you what exactly you're looking for.
Use the following code to obtain the t-statistics for the case-control differences at the gene level in both amygadala and the sACC.
library("SummarizedExperiment")
load("/dcl01/lieber/ajaffe/lab/zandiHyde_bipolar_rnaseq/case_control/bipolarControl_deStats_byRegion_qSVAjoint_withAnnotation.rda", verbose=TRUE)
statOutGene <- statOut[statOut$Type == "Gene",]
You'll want to use the t_Amyg
and the t_sACC
columns.
> head( statOutGene)
DataFrame with 6 rows and 17 columns
logFC_Amyg AveExpr_Amyg t_Amyg P.Value_Amyg adj.P.Val_Amyg
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000227232.5 -0.0330728 1.340582 -0.414687 0.67877922 0.909528
ENSG00000278267.1 0.1612828 -1.412959 1.236884 0.21746318 0.658652
ENSG00000269981.1 0.0810748 -2.146991 0.470339 0.63858384 0.896763
ENSG00000279457.3 0.2116777 2.184323 3.114915 0.00208757 0.117127
ENSG00000228463.9 0.0745462 1.862994 0.635151 0.52599712 0.852000
ENSG00000236679.2 0.1817031 -0.492242 2.053826 0.04118791 0.365443
B_Amyg logFC_sACC AveExpr_sACC t_sACC P.Value_sACC
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000227232.5 -5.84638 0.00132064 1.35091 0.0162341 0.9870610
ENSG00000278267.1 -4.75310 0.05092425 -1.30810 0.4402495 0.6601466
ENSG00000269981.1 -5.16546 0.33776932 -2.44735 2.0371945 0.0427113
ENSG00000279457.3 -1.57665 0.11896450 2.20583 1.8870357 0.0603442
ENSG00000228463.9 -5.84781 0.01144101 1.78457 0.1266352 0.8993336
ENSG00000236679.2 -3.75354 0.08722632 -0.55163 1.0176247 0.3098656
adj.P.Val_sACC B_sACC gencodeID Symbol
<numeric> <numeric> <character> <character>
ENSG00000227232.5 0.996119 -5.73480 ENSG00000227232.5 WASH7P
ENSG00000278267.1 0.886615 -5.12574 ENSG00000278267.1 MIR6859-1
ENSG00000269981.1 0.290631 -3.84247 ENSG00000269981.1
ENSG00000279457.3 0.340016 -4.28670 ENSG00000279457.3
ENSG00000228463.9 0.971597 -5.82341 ENSG00000228463.9
ENSG00000236679.2 0.683774 -4.91603 ENSG00000236679.2 RPL23AP24
Type Class EntrezID
<character> <character> <integer>
ENSG00000227232.5 Gene InGen NA
ENSG00000278267.1 Gene InGen 102466751
ENSG00000269981.1 Gene InGen NA
ENSG00000279457.3 Gene InGen 102723897
ENSG00000228463.9 Gene InGen NA
ENSG00000236679.2 Gene InGen NA
> dim(statOutGene)
[1] 25136 17
That's the information we want on the Y-axis. On the X-axis we want the TWAS Z score. I wouldn't worry about faceting by "risk locus" or coloring by "best GWAS p-value < e5-08".
Also, for the TWAS Amygadala vs TWAS sACC plot, can you use the colors for the regions you listed earlier? The orange & blue colors you used are those that we had used in the past for DLPFC and HIPPO in the BSP2 paper. Thanks.
I'm on it.
Also, for the TWAS Amygadala vs TWAS sACC plot, can you use the colors for the regions you listed earlier? The orange & blue colors you used are those that we had used in the past for DLPFC and HIPPO in the BSP2 paper. Thanks.
I tried earlier and the colors were illegible/did not look very good, especially the faded pastel yellow against a white background. I can generate it again so you can see what I mean.
Here's a draft of the second scatterplot. It looks pretty bare; let me know if you want me to add anything to it or embellish it at all. I would have facet_wrap
ped by region
but the subregion data are in separate columns so it was just easier to make them as separate plots.
Above is the other plot with the colors you requested. I used #FFFF7F
("Laser Lemon" according to coolors.co) for amygdala, #B2CCFF
("Baby Blue Eyes") for sACC, #B2CCBF
(Opal) for Both, and a slightly darker shade of gray (#BFBFBF
or "Gray X 11 Gray") for neither. I would strongly recommend using a different color palette even though it would be inconsistent with the one used throughout the rest of the paper. If it's too much trouble, this could obviously be used, and of course, it's ultimately not my decision, but as the person who generated the plot, I will say that it's a) difficult to even look at the amygdala points and b) difficult to distinguish "both" points from "sACC."
Here are the TWAS Z vs t-stat scatterplots:
re: the "TWAS Z by Brain Region" scatterplot, we decided we are going to wait until next Tuesday to see what their idea is on changing the color palette. Once we get an answer from them, I will update the colors and then (unless there is anything else to cover) close this issue.
The new plots look good. Is the red line a regression line? It's basically at y = 0.
Can you make the X and Y axis legends bigger? This can be done using theme_bw(base_size = 20)
for example (though the exact number will depend on the size of your PDF file). Basically, imagine that someone is showing this plot in a presentation. Right now, the axis labels and axis ticks are too small to read.
The new plots look good. Is the red line a regression line? It's basically at y = 0.
Yeah, I thought you wanted me to add one?
Can you make the X and Y axis legends bigger?
The labels were originally at theme_bw(base_size = 30)
but it was too hard to look at so I set them to default. But sure, I'll tweak them to make it look good.
Err no, I meant the correlation coefficients, like at https://github.com/LieberInstitute/brainseq_phase2/tree/master/twas#figure-1.
And ok, tweaking the base_size
sounds good.
Unfortunately, I wasn't able to figure out how to force the rho
value to display in scientific notation with sigfigs, despite formatting it that way more than once.
amyg_rho <- format(cor(merged_t$TWAS.Z_amygdala, merged_t$t_Amyg), digits = 3, scientific = TRUE)
sacc_rho <- format(cor(merged_t$TWAS.Z_sacc, merged_t$t_sACC), digits = 3, scientific = TRUE)
ggplot(merged_t,
aes(x = TWAS.Z_amygdala,
y = t_Amyg)) + geom_point() + labs(title = "TWAS vs BD differential expression in Amygdala", x = "TWAS Z score", y = "BD vs control t-statistic") + annotate("text", x = -5.5, y = 6, label = paste0("rho == ", formatC(amyg_rho, format = "e")), parse = TRUE) + scale_y_continuous(breaks = c(-6, -3, 0, 3, 6)) + xlim(-6, 6) +
theme_bw(base_size = 20)
ggplot(merged_t,
aes(x = TWAS.Z_sacc,
y = t_sACC)) + geom_point() + labs(title = "TWAS vs BD differential expression in sACC", x = "TWAS Z score", y = "BD vs control t-statistic")+ annotate("text", x = -5.5, y = 6, label = paste0("rho == ", formatC(sacc_rho, format = "e")), parse = TRUE) + scale_y_continuous(breaks = c(-6, -3, 0, 3, 6)) +
scale_x_continuous(breaks = waiver()) +
theme_bw(base_size = 20)
I also updated the legend titles for the scatterplot and base_size
:
It still doesn't really look good if I turn down the brightness and turn up the saturation. I think yellow and green are just bad colors for a white background.
generate_twas_plots.R
is basically done. I haven't run it on the cluster yet; I've only run parts of it and then uploaded the results onto the cluster, but it should work. I'm keeping this issue open for when we settle on a color scheme for the above scatter plot, but that should only take a day or so.
After meeting with Peter, we agreed that the original color scheme with the purple and orange is the best for the plot (https://github.com/LieberInstitute/zandiHyde_bipolar_rnaseq/issues/4#issuecomment-729117914), and there's no real need to use the same color scheme throughout the paper as long as the plot is clearly labeled. I will develop an SGE script that will generate these plots on the cluster and eventually run it to regenerate the plots, but for the time being, I will close this issue.
Hi Arta,
Similar to #3, once #2 is done, can you make plots like Figure S45 and S47 from https://www.cell.com/cms/10.1016/j.neuron.2019.05.013/attachment/0f56af5c-ec8d-42eb-b6d8-33bf7495e635/mmc1.pdf? Thanks!
Figure S45
Figure S45 involves this code https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/explore_twas_psycm.R#L262-L325 though we could also make a gene-level only plot like https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/explore_twas_psycm.R#L327-L345. The code is the same one that was then adapted for https://github.com/LieberInstitute/twas/blob/master/bsp2/read_twas.R#L528-L575.
We need to use the same colors for the regions as the ones Emily used for the rest of the BPD project.
Figure S47
For Figure S47, we need to adapt code from https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/explore_twas_psycm.R#L662-L680 if I'm remembering correctly. This involves reading in the BPD case-control differential expression results, so adapting this code might involve multiple steps.
It might be easier to have a single R script for each plot.
Best, Leo