J35P312 / TIDDIT

TIDDIT - structural variant calling
Other
10 stars 0 forks source link

Consistent FORMAT fields #5

Closed moonso closed 6 years ago

moonso commented 7 years ago

Hi,

not sure if I'm right here but in the output files I have seen:

##FORMAT=<ID=PE,Number=1,Type=Integer,Description="Number of paired-ends that support the event">
##FORMAT=<ID=SR,Number=1,Type=Integer,Description="Number of split reads that support the event">

While I've seen in other SV callers output(like manta):

##FORMAT=<ID=PR,Number=.,Type=Integer,Description="Spanning paired-read support for the ref and alt alleles in the order listed">
##FORMAT=<ID=SR,Number=.,Type=Integer,Description="Split reads for the ref and alt alleles in the order listed, for reads where P(allele|read)>0.999">

This means that the output for TIDDIT will always have one number for these while manta and others will have at least two. It is already hard to parse VCF output for SVs and would be nice if as much as possible would look the same.

What do you think?

J35P312 commented 7 years ago

Hello there! As you have seen, there is a bit of a difference between different callers. I followed the same format as Lumpy when I made the TIDDIT vcf format. I could add the number of split reads for the ref allele. But the number of discordant pairs would be a bit tricky to add; I would then need to use indexing to move around the bam file and check the number of normal pairs, which would be a bit too slow to be worthwile.

J35P312 commented 6 years ago

THanks for yourr input! The format fields are now the same as the Delly format.