How to prepare vcf file for Duplication & Tandem Duplication?

Illumina / paragraph

Graph realignment tools for structural variants

Other

150 stars 28 forks source link

How to prepare vcf file for Duplication & Tandem Duplication? #41

Closed Zhiliang-Zhang closed 4 years ago

Zhiliang-Zhang commented 4 years ago

Hi, I'm trying paragraph for genotyping DUP & TDUP with the following command: python3 ~/miniconda3/pkgs/paragraph-2.3-h8908b6f_0/bin/multigrmpy.py -i TDUP.vcf -m samples.txt -r ~/reference/genome.fa -o TDUP

Here are some of the contents in samples.txt & TDUP.vcf file:

But 75% of genotypes are missing when genotyping DUP & TDUP.

Could you give me some advice? Thanks!

Zhiliang

traxexx commented 4 years ago

Hi Zhiliang,

In real practice Paragraph treats duplications as insertions. With such treatment, the similarity between duplications can significant lower the genotyping performance. And for large DUPs (the one at 1:2388494), the best genotyping method should be read depth counting, but it's not in Paragraph current version yet. Let me check if there is any other good solutions for this case.

traxexx commented 4 years ago

For now maybe try GenomeStrip for CNVs? It should work for both discovery and genotyping mode: http://software.broadinstitute.org/software/genomestrip/ It will work better if you have a bunch of samples running together.

Zhiliang-Zhang commented 4 years ago

For now maybe try GenomeStrip for CNVs? It should work for both discovery and genotyping mode: http://software.broadinstitute.org/software/genomestrip/ It will work better if you have a bunch of samples running together.

Hi, Thanks for your reply and advice. That sounds great and I will have a try.

Zhiliang