bmansfeld / QTLseqr

QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis
65 stars 42 forks source link

Narrow down QTL region #2

Closed mstetter closed 6 years ago

mstetter commented 6 years ago

Is there a way to further narrow down a QTL region? I have whole genome sequencing data and my QTL after a standard run of QTLseqr ist over 10 mb including 43126 SNPs. Is there a way I can further narrow this down?

bmansfeld commented 6 years ago

First of all I'm happy that you found a significant region in your data! In my experience, many of the published QTL-seq/NGS-BSA can identify QTL with similar sizes... This is one limitation I find with this method. I think this largely depends on your trait, the population structure and most importantly, size. But even in the example data set (F3 pop of 10k individuals, and n=~400/bulk) large regions are IDed. Does the peak look very different from the rest of the data? How broad is your peak? If your peak has a narrow summit, you could be more stringent with your FDR. If you have 100s of thousands of SNPs then an FDR of 5% is still a lot. Do you have any thoughts or suggestions on statistically narrowing down the regions statistically? One thing I am working on is integrating Takagi et al.,'s statistical simulation into QTLseqr, this might yield different thresholds.

mstetter commented 6 years ago

Thanks, I was hoping for statistical approach, but I am not aware of any. (This is not really an issue, so feel free to close. I thought it might be a thing people would want to discuss)

bmansfeld commented 6 years ago

I've been working on incorporating the Takagi et al (https://www.ncbi.nlm.nih.gov/pubmed/23289725) method for assessing significance of Δ(SNP-index). They use a simulation method that bootstraps 10000 allele frequencies at each depth of sequencing. I wrote some functions to perform the analysis but haven't finished making user functions that can easily be utilized to plot the confidence intervals they suggest. In any case the method appears to be less stringent and with 0.995 Confidence intervals the regions IDed as significant are just as wide as those IDed with the G' methods. See figure from my test data.

rplot