cortes-ciriano-lab / savana

Somatic structural variant caller for long-read data
Apache License 2.0
46 stars 2 forks source link

Add SVTYPE in the INFO column #8

Closed soymintc closed 1 year ago

soymintc commented 1 year ago

Although the user could figure out what SVTYPE (according to VCF specs) each SV is, I think it would be better and error-prone if SAVANA could describe the SVTYPE for each SV

helrick commented 1 year ago

Hi there, thanks for the suggestion! I've implemented this in PR #12, but am calling everything BND except for insertions (INS) because without copy number information, it's not possible to definitively call a SVTYPE (in VCFv4.2). GRIDSS has a more in depth explanation for their decision to do this here: https://github.com/PapenfussLab/gridss#why-are-all-calls-bnd.

Currently SAVANA implements VCFv4.2 as its output format, but I will look into implementing the newly released v4.4 in future which should allow us to capture and report this.

For now, we report the breakend orientation using brackets in the ALT field as described in section 5.4 of VCFv4.2. We also report this in a "BP_NOTATION" field which can be converted to different nomenclatures as follows:

Nomenclature Deletion-like Duplication-like Head-to-head Inversion Tail-to-tail Inversion
BP_NOTATION +- -+ ++ --
Brackets (VCF) N[chr:pos[ / ]chr:pos]N ]chr:pos]N / N[chr:pos[ N]chr:pos] / N]chr:pos] [chr:pos[N / [chr:pos[N
5' to 3' 3to5 5to3 3to3 5to5
soymintc commented 1 year ago

@helrick Thanks!! Realized that SVTYPE will be deprecated in VCF v4.4!