cortes-ciriano-lab / savana

Somatic structural variant caller for long-read data
Apache License 2.0
43 stars 2 forks source link

Small and possibly erroneous SVs in the final strict VCF file #21

Closed zhemingfan closed 1 year ago

zhemingfan commented 1 year ago

Hi @helrick,

I was testing SAVANA, and noticed I got the following call in the strict VCF output. How do I interpret this results? Based on your previous comment this post, this should be a deletion-like signal, but is the intended behaviour to call a 1 bp deletion?

chr11   11269892    ID_53604_1  T   T[chr11:11269892[   .   PASS    SVTYPE=BND;MATEID=ID_53604_2;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=8;SVLEN=0;BP_NOTATION=+-;ORIGINATING_CLUSTER=52568ca30af042d481f783ff04b42b83;END_CLUSTER=bd7438d8656f4c89a1d60644cc0b0b8b;ORIGIN_STARTS_STD_DEV=5.97;ORIGIN_STARTS_MEDIAN=11269892.0;ORIGIN_EVENT_SIZE_STD_DEV=0.0;ORIGIN_EVENT_SIZE_MEDIAN=1.0;ORIGIN_EVENT_SIZE_MEAN=1;ORIGIN_UNCERTAINTY=6.97;ORIGIN_EVENT_HEURISTIC=0.0;END_STARTS_STD_DEV=1.32;END_STARTS_MEDIAN=11269892.0;END_EVENT_SIZE_STD_DEV=0.0;END_EVENT_SIZE_MEDIAN=1.0;END_EVENT_SIZE_MEAN=1;END_UNCERTAINTY=2.32;END_EVENT_HEURISTIC=0.0  GT  0/1
chr11   11269892    ID_53604_2  T   ]chr11:11269892]T   .   PASS    SVTYPE=BND;MATEID=ID_53604_1;NORMAL_SUPPORT=0;TUMOUR_SUPPORT=8;SVLEN=0;BP_NOTATION=+-;ORIGINATING_CLUSTER=52568ca30af042d481f783ff04b42b83;END_CLUSTER=bd7438d8656f4c89a1d60644cc0b0b8b;ORIGIN_STARTS_STD_DEV=5.97;ORIGIN_STARTS_MEDIAN=11269892.0;ORIGIN_EVENT_SIZE_STD_DEV=0.0;ORIGIN_EVENT_SIZE_MEDIAN=1.0;ORIGIN_EVENT_SIZE_MEAN=1;ORIGIN_UNCERTAINTY=6.97;ORIGIN_EVENT_HEURISTIC=0.0;END_STARTS_STD_DEV=1.32;END_STARTS_MEDIAN=11269892.0;END_EVENT_SIZE_STD_DEV=0.0;END_EVENT_SIZE_MEDIAN=1.0;END_EVENT_SIZE_MEAN=1;END_UNCERTAINTY=2.32;END_EVENT_HEURISTIC=0.0  GT  0/1

These are the parameters I used to run SAVANA:

singularity exec -B /projects,/home envs/savana_0.2.3--pyhdfd78af_0.sif \
savana \
    --tumour {input.BAM_t} \
    --normal {input.BAM_n} \
    --outdir {params.outdir} \
    --ref {input.reference} \
    --threads {threads} \
    --length 50 \
    --sample {wildcards.sample}
helrick commented 1 year ago

Hi there, thanks for this bug report!

Do you have any indication that there may be a real variant here that's being incorrectly reported? (i.e. from looking at the region in IGV). If you have a sample BAM file subsetted to this region that you're able to share, I can also test on my end.

One possibility is that there is a foldback inversion over an area with an insertion. I would still consider it a bug, but it's a bit tricky to solve since it's related to how aligners report insertions in supplementary alignments (often they are not explicitly reported).

So for example you could have a case like this: Screenshot 2023-06-19 at 16 51 40

I've simplified a bit, but currently, there is nothing in SAVANA to account for cases like the above (where there are missing bases in the supplementary/softclipped regions). I will have to think further on how to address this.

Does this explanation seem like it fits to your example?

zhemingfan commented 1 year ago

This seems to be resolved in the new version, closing this issue. Thank you!