bmansfeld / QTLseqr

QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis
64 stars 42 forks source link

QTLseqR on GBS data #52

Closed MatteoMartina closed 1 year ago

MatteoMartina commented 2 years ago

Hi Ben, in a previous experiment, we run your pipeline with really nice results performing wgs on the bulks. That's not economically feasible in species with really wide genomes (7-9Gb).

What I would like to do now is to use QTLseqR with GBS data, using a catalog from stacks as reference genome. Is it something you think is doable with your pipeline?

I can try anylize the data again, but last time I tried I ended up with several errors.

Thanks! Matteo

bmansfeld commented 2 years ago

Hey Matteo, Thanks for using QTLseqr in your work! Happy to hear you were able to get some results. Yes in theory you can use a reduced representation sequencing system with QTLseqr but you might need to be a little creative.

So first of all my question is do you have GBS data for the bulk as a bulk (ie one pooled DNA sample per bulk and then sequenced on GBS) or do you have GBS data for every single individual in the bulk (ie each of say n=15 individuals was GBSed separately for each of the bulks).

If you have 1 sample per bulk, you should be able to just call SNPs as if you were working with regular WGS and though i've personally never done this (and not sure about the biases and read depth issues with GBS and this approach) it should be able to run like the regular pipeline.

If you have the second case where you are essentially doing an in-silico pooling of individuals that were all GBSed separately then this can be done but there is not really precedence for how to do this statistically. I'm thinking about the best way to develop this perhaps in the future.

But that being said, I have recently done this by manually averaging the SNP-indeces for all the individuals in the Bulk and then setting up the dataframes to work with the QTLseqr scripts to compare the bulks. This worked for my case (single dominant gene) and we were able to map the gene. I did have to use the QTLseqr scripts source code a bit differently and not directly using the commands developed for the package to wrangle the data to work. Maybe this can help you

preprint here: https://www.biorxiv.org/content/10.1101/2022.04.13.487913v1.abstract scripts here: https://github.com/bmansfeld/CMD2_project/blob/main/CMD2_mapping_and_phenotype_scripts.R

This might help you get were you want to go until I have time to develop this more in depth for the future.

Hope this helps let me know if you have questions hopefully I can help you! Ben