Closed kalexiou closed 5 years ago
That's a good question - yes in general the methods should yield close results. It's hard to guess why there is a big difference with out seeing the plots. One thing to note (and I've seen in other's data) is that Gprime is sensitive to zigzaging patterns of deltaSNP, because it is an absolute value. ie if you have a large dip in deltaSNP and then a large peak that whole region from the start of the dip to the end of of the peak is detected by Gprime as deviating from the expected 0.5 allele frequency. This can affect the center of the peak. You might want to change your smoothing window. Hope this helps. I would recommend going with the method your community is more comfortable using if you are uncomfortable with these discrepancies. Let me know if you have anymore questions. I'll leave this issue open for a while. Ben
Hi Ben (sorry for the Steve in my first message!)
Thanks for your quick response. You are right. When I increased the size of the smoothing window from 4.0e6 to 4.7e6, deltaSNP and GprimeSNP got closer (7678525 and 8086713, respectively). The thing is that, from previous studies, we know that the gene responsible is at ~7,2Mbp, with deltaSNP being closer to the gene than Gprime statistics.
I attach below some graphs in order to have your opinion on the quality of the data as well and/or any additional comments that you may have.
Thanks a lot Kostas
Graphs after data filtering:
QTL graphs
1Mbp window - deltaSNP
1Mbp window - Gprime
4.7Mbp window - deltaSNP
4.7Mbp window - Gprime
Hi, Your data look pretty good to me. Keep in mind that both methods rely on recombination in your population and your population size and how strict your selection is. In my eyes having both of those peaks within 1Mb of your casual gene is an amazing result for these methods. The methods are slightly different and so the peak values are slightly different. Again, you need to decide which method you want to work with based on the literature and the norms in your community. Good luck with your work, Ben
Hi Steve,
First of all let me congratulate you for this nice tool, bringing together both statistical methods developed by Takagi et al. and Magwene et al.
After running successfully the QTL analysis, I extracted the QTLs using the getQTLTable() function and, for my major QTL in chr5, I see a considerable difference between the peak positions of deltaSNP and Gprime statistics. This holds true for the minor QTLs as well. I think that, in general, these two values should be quite close, no?
Below I provide the workflow of the R script used and the resulting QTL table:
library("QTLseqr")
file20 <- "SR_filt_perc20_2.table"
hb20 <- "BWA_output/R_all_20perc.bam"
lb20 <- "BWA_output/S_all_20perc.bam"
Chroms <- paste0(rep("chr", 7), 1:7)
df20 <- importFromGATK(file = file20, highBulk = hb20, lowBulk = lb20, chromList = Chroms)
df_filt20 <- filterSNPs(SNPset = df20, minTotalDepth = 10, refAlleleFreq = 0.2, maxTotalDepth = 150, minSampleDepth = 15, minGQ = 30)
`df_filt20 <- runGprimeAnalysis(SNPset = df_filt20, windowSize = 4e6, outlierFilter = "deltaSNP", filterThreshold = 0.2)
plotQTLStats(SNPset = df_filt20, var = "nSNPs") + ggtitle('nSNPs - perc20_samtools')
plotQTLStats(SNPset = df_filt20, var = "deltaSNP", plotIntervals = TRUE) + ggtitle('deltaSNP - perc20_samtools') + ylim(-0.8,0.8)
plotQTLStats(SNPset = df_filt20, var = "Gprime", plotThreshold = TRUE, q = 0.001) + ggtitle('Gprime - perc20_samtools')
results_20 <- getQTLTable(SNPset = df_filt20, method = "Gprime",alpha = 0.001, export = FALSE)
results_20
Thanks Kostas