GATB / DiscoSnp

DiscoSnp is designed for discovering all kinds of SNPs (not only isolated ones), as well as insertions and deletions, from raw set(s) of reads.
https://gatb.inria.fr/software/discosnp/
GNU Affero General Public License v3.0
38 stars 20 forks source link

Distinguishing inherited variants from de novo variants #12

Closed standage closed 3 years ago

standage commented 5 years ago

Greetings!

I had no issues installing and running discoSnp++ on my data set. This was a pleasant surprise since bioinformatics software can often be problematic. Thanks!

I have a question about interpreting and parsing the results. I analyzed a trio (mother, father, and child) and I'm interested in finding the de novo variants in the child. I called variants on all 3 individuals simultaneously using the "fof of fofs" configuration strategy. The samples are labeled G1, G2, and G3 in *coherent.vcf file. I'm pretty confident I've figured out which labels apply to which individuals, and now I'm looking for the de novo variants. My plan is to pull out records where filter=PASS and GT=0/0,01/,0/0 for dad, kid, and mom. It looks like the variants are already sorted by rank.

Is this the correct strategy? Anything I should keep in mind?

Thanks!

pierrepeterlongo commented 5 years ago

Hello Daniel,

Thanks for your comments!

I'm pretty confident I've figured out which labels apply to which individuals

You can double check this in the discoRes_read_files_correspondance.txt file which indicates the correspondence read set / id (here C_1, ..., which is the same order as Q1, ..., or G1, ... respectively for qualities and genotypes).

If you want to focus only on precision, you may indeed consider only highly ranked variants, with the PASS flag. Else you may be less strict on these constraints.

In your precise study case, you could also check for the 1/1,0/1,1/1 variants in addition to the 0/0,0/1,0/0.

About variant order in the vcf file: in case you did not use any reference genome for localizing a posteriori the variant positions, variants are sorted by rank. Else they are sorted by mapping positions.

Best, Pierre