bmansfeld / QTLseqr

QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis
64 stars 42 forks source link

Replication #3

Closed stephenrdoyle closed 6 years ago

stephenrdoyle commented 6 years ago

Just having a play with QTLseqr and it seems to work quite nicely. Great job.

Just wondering if/can you consider replicates in the analysis. So far it looks like it one uses a single high and low bulk as input.

Cheers, Steve

bmansfeld commented 6 years ago

Hi Steve, Thanks for the great feedback, I really appreciate it! Right now QTLseqr is designed to only accept one sample per bulk. The reason for this is that genotyping multiple samples together by collecting read depths per SNP may be more complicated than just summing these numbers together. I think the best way would be to let GATK decide this. I've recommended to others to go back and use picardtools to assign read groups (Picard AddOrReplaceReadGroups) to all the samples in the high bulk and low bulk, such that GATK knows they are all part of the "same sample". That way you can pretend as if they were all reads from the same sample, but perhaps run on different lanes (or something else..) Then rerun haplotypecaller in GVCF mode and use GenotypeGVCF to create one VCF file that will have two samples only. Then export using VariantsToTable.

I realize this may be a mildly annoying answer, but currently I am not planning on adding support for more than two bulk samples. If you have some suggestions on merging allele depths from the different samples, I am open to them. Thanks again good luck with your research, Ben

stephenrdoyle commented 6 years ago

Hi Ben,

Thanks for the info. I guess merging bams is the most obvious fix, but it removes the power of true replication, ie. biological, in which additional recombination could be exploited to narrow a QTL.

I guess my thinking was something along the lines of a CMH test, in which you could use replicates of matched pairs for example.

I need to think about it some more, in in the mean time, keep up the good work!

Steve


Dr Stephen R Doyle Postdoctoral Fellow Parasite Genomics Group Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA UK E: sd21@sanger.ac.ukmailto:sd21@sanger.ac.uk E: stephen.doyle@sanger.ac.ukmailto:stephen.doyle@sanger.ac.uk W: https://www.researchgate.net/profile/Stephen_Doyle; orcid.org/0000-0001-9167-7532http://orcid.org/0000-0001-9167-7532

On 20 Dec 2017, at 20:25, Ben Mansfeld notifications@github.com<mailto:notifications@github.com> wrote:

Hi Steve, Thanks for the great feedback, I really appreciate it! Right now QTLseqr is designed to only accept one sample per bulk. The reason for this is that genotyping multiple samples together by collecting read depths per SNP may be more complicated than just summing these numbers together. I think the best way would be to let GATK decide this. I've recommended to others to go back and use picardtools to assign read groups (Picard AddOrReplaceReadGroups) to all the samples in the high bulk and low bulk, such that GATK knows they are all part of the "same sample". That way you can pretend as if they were all reads from the same sample, but perhaps run on different lanes (or something else..) Then rerun haplotypecaller in GVCF mode and use GenotypeGVCF to create one VCF file that will have two samples only. Then export using VariantsToTable.

I realize this may be a mildly annoying answer, but currently I am not planning on adding support for more than two bulk samples. If you have some suggestions on merging allele depths from the different samples, I am open to them. Thanks again good luck with your research, Ben

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/bmansfeld/QTLseqr/issues/3#issuecomment-353173228, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APF5EprFW3yfp2I6McXixKgLj9jH5Eznks5tCW0rgaJpZM4RI5iv.

-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

bmansfeld commented 6 years ago

Hi Steve, After talking about your question with a colleague, I realized that I misunderstood you. I thought you were referring to replicates within bulks (ie individuals being sequenced) now I understand you may be referring to experimental replicates (ie different bulks). This is a good idea, and currently I would suggest overlapping significant regions in each rep. to narrow down regions. I've looked into the CMH test you've suggested and at first glance it looks appropriate. I would have to see how this would integrate with the G' statistic, but in general it seems they are both based on similar chi^2-like analyses. This could potentially be a new standard for confirming QTL in BSA type analyses. Thanks for pointing me in the right direction, but incorporating this to QTLseqr is not on the immediate horizon. Ben