SciLifeLab / TIDDIT

TIDDIT - structural variant calling
Other
69 stars 13 forks source link

Duplicate VCF lines from HG002 BAM #111

Open themkdemiiir opened 9 months ago

themkdemiiir commented 9 months ago

Hello,

I tested your tool on AshkenazimTrio and noticed that vcf_id is common for BND pairs with different quality, and there are duplicate vcf lines. The reference files I used

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/

The Bam and index files used

https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/

The duplicate vcf lines. I can also share the VCF file if you want so. Thanks

chr5    58813706    SV_736_1    N   ]chr5:58813779]N    50  PASS    SVTYPE=BND;REGIONA=58813706,58813943;REGIONB=58813723,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=GGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAATGCACACATACAGGCAATCAGGAATGCAGAAATGAATTTACCAAGTTACAAAATGGGTTAACACCCATGGAGCAAGAATCAGATGCATGCCACCAAACACAATTTATTGGCATTTCTTTCTATTTGCAAGAACTTGTATTATTATTGGTTTTCCACCACCTAC GT:CN:COV:DV:RV:LQ:RR:DR    0/1:2:62.02100840336134,59.54054054054054,59.0:0:0:0.0,0.0:36,46:36,39
chr5    58813706    SV_1066_1   N   ]chr5:58813779]N    50  PASS    SVTYPE=BND;REGIONA=58813706,58813755;REGIONB=58813502,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=CCCCCCATGGATCTTTCTACACGCGCGGGGTTGGGTATCTTCTGTGTGCACACTGCTCACCCCCCGTTCTCATAGACAGGTTGTCTAGTCACTCCAAGCACATGCCTTCCTTAGCCATTGTATTGTTAAGTTTTTATGTTTTATTTATATTTATATTTATATATATATATATATATATATATATATATATATATATACACATACACACATATACATATGGTAGAACCACAGCTTTTATCCAAATATAAAATAAACACATGTCAAAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAAATCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCAGAGGCACAAGTGGATACTCAGTGAACGCAA  GT:CN:COV:DV:RV:LQ:RR:DR    0/1:2:59.32,59.54054054054054,61.881294964028775:0:0:0.0,0.0:36,46:36,39
chr5    58813779    SV_736_2    N   N[chr5:58813706[    50  PASS    SVTYPE=BND;REGIONA=58813706,58813943;REGIONB=58813723,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=GGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAATGCACACATACAGGCAATCAGGAATGCAGAAATGAATTTACCAAGTTACAAAATGGGTTAACACCCATGGAGCAAGAATCAGATGCATGCCACCAAACACAATTTATTGGCATTTCTTTCTATTTGCAAGAACTTGTATTATTATTGGTTTTCCACCACCTAC GT:CN:COV:DV:RV:LQ:RR:DR    0/1:2:62.02100840336134,59.54054054054054,59.0:0:0:0.0,0.0:36,46:36,39
chr5    58813779    SV_1066_2   N   N[chr5:58813706[    50  PASS    SVTYPE=BND;REGIONA=58813706,58813755;REGIONB=58813502,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=CCCCCCATGGATCTTTCTACACGCGCGGGGTTGGGTATCTTCTGTGTGCACACTGCTCACCCCCCGTTCTCATAGACAGGTTGTCTAGTCACTCCAAGCACATGCCTTCCTTAGCCATTGTATTGTTAAGTTTTTATGTTTTATTTATATTTATATTTATATATATATATATATATATATATATATATATATATATACACATACACACATATACATATGGTAGAACCACAGCTTTTATCCAAATATAAAATAAACACATGTCAAAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAAATCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCAGAGGCACAAGTGGATACTCAGTGAACGCAA  GT:CN:COV:DV:RV:LQ:RR:DR    0/1:2:59.32,59.54054054054054,61.881294964028775:0:0:0.0,0.0:36,46:36,39
J35P312 commented 9 months ago

Hello! Thanks for testing tiddit! There are interesting examples, I see that they are detected through de novo assembly only, and that they are called based on two distinct contigs... I will have a look and see if I can make TIDDIT collapse this kind of calls.

Best regards Jesper