Closed MKLau closed 7 years ago
Via Caroline Cusack at Broad:
"You're on the right track with the GATK website. Also try the forums (http://gatkforums.broadinstitute.org/gatk/), there are people that maintain them and they can be very helpful. It will be a process to figure out how to run GATK for your particular organism. I would suggest that you call SNPs genome-wide first, and then look to see where they fall to dive into individual genes. Also, it's important that you pick your best reference genome, and align data to that for analysis purposes. By best, I typically recommend the genome that looks most complete (maybe the largest?). Also, I should point out, GATK is mostly supported for human, so whenever one wants to utilize it for an alternative genome it's always a process to figure out the exact optimal parameters. The GATK people should be able to help out. Alternatively, I know you mentioned this but I forget the answer, if your data is haploid, then you could try using Pilon to call SNPs: https://github.com/broadinstitute/pilon/wiki."
Via Sarah Young at Broad:
"Not sure about wider insects but I don't think we've found anything significantly better for mosquitoes than using GATK. For the AG1000g study we used Unified Genotyper with some manual filtering based on a validated set of F1 crosses. Methods are all up on Biorxiv (in the supplemental): http://biorxiv.org/content/early/2016/12/22/096289
UnifiedGenotyper was largely used because HaplotypeCaller can take a very long time with highly heterogeneous samples (the reassembly graph gets very large), but personally I'd lean on haplo caller if you have the time, particularly for INDELs and particularly if you don't have a validation set of markers to compare against."
If you have any more direct questions, I'm happy to put you directly in touch with him.
https://software.broadinstitute.org/gatk/documentation/pipelines