cortes-ciriano-lab / savana

Somatic structural variant caller for long-read data
Apache License 2.0
43 stars 2 forks source link

SVLEN INFO field should be negative for deletions #13

Closed waltergallegog closed 1 year ago

waltergallegog commented 1 year ago

Hello, According to the vcf standard the svlen field should be negative for deletions:

Longer ALT alleles (e.g. insertions) have positive values, shorter ALT alleles (e.g.deletions) have negative values.

This conventions is also followed by other long read sv callers (cuteSV, nanomonsv).

As far as I can tell, savana's output has svlen positive for all types of svs.

The correct svlen sign and the implementation of https://github.com/cortes-ciriano-lab/savana/issues/8 could help a lot with downstream analysis.

Thanks and BR.

helrick commented 1 year ago

Hi there, thanks for the suggestion! As I mention in #8, without copy number information, it's not possible to definitively call an SVTYPE (in VCFv4.2), so we use BND as the SVTYPE. GRIDSS has a more in depth explanation for their decision to do this here: https://github.com/PapenfussLab/gridss#why-are-all-calls-bnd. For the same reasons, it's not possible to determine whether an SVLEN should be positive or negative, so we've kept it positive for all SVs.

In future, we'll be moving to VCFv4.4 in which the SVLEN value is required to always positive.

waltergallegog commented 1 year ago

Thanks for the detailed answer.