EichlerLab / svpop

Variant annotation and merging pipeline
26 stars 6 forks source link

svindel clarification #10

Open wharvey31 opened 10 months ago

wharvey31 commented 10 months ago

Hi,

We are playing around with svpop/3.4.1 and are seeing some unexpected behavior in the sv/svindel/indel categories. Specifically, we are observing this with sniffles2 variant calls between two samples.

Specifically, the files contained in: results/variant/intersect/caller+sniffles-ONT+PS00342/caller+sniffles-ONT+PS00361/svpopdef/all/all/indel_del/intersect.tsv.gz contain variants which are not indels:

ID_A    ID_B    SOURCE_SET      RO      OFFSET  SZRO    OFFSZ   MATCH
chr1-609573-DEL-64              A,
chr1-610275-DEL-136             A,
chr1-611308-DEL-726             A,
        chr1-668821-DEL-153     ,B

Additionally, when considering the complete set, it appears that the sv subset drops certain variants. I have attached venn diagrams to show this, but the variants contained in the B set do not get transmitted to the intersect or the venn diagram.

variant_venn_sv_del variant_venn_svindel_del

Finally, the variant venn diagrams are unaware that indels and SVs are being merged together which will throw specific errors when trying to generate SV venn diagrams:

Error in rule var_intersect_venn:
    jobid: 0
    output: results/variant/intersect/caller+sniffles-ONT+PS00342/caller+sniffles-ONT+PS00361/svpopdef/all/all/sv_ins/venn/variant_venn.png

RuleException:
KeyErrorin line 50 of /net/eichler/vol26/7200/software/pipelines/svpop/svpop-3.4.1/rules/variant/intersect.snakefile:
'chr6-170253323-INS-49'
paudano commented 10 months ago

Variant type "svindel" is new to SV-Pop, and it was added to support merging SVs and indels in a less biased way. Before, it would match SV strictly with other SVs, but small changes in variant size would hide things from merges and intersects because the callset was split at 50 bp (i.e. 51 bp SV should be supported by a 49 bp indel).

I need to work out some of the kinks for intersects though, I can see how this is leaving artifacts there. The first point about SVs ending up in the indel set makes sense, and I can see why it's crashing.

I cannot follow what the Venns are showing, however. I'll need some clarification for that.

wharvey31 commented 10 months ago

Thanks for figuring out what is going on with the venn diagrams.

The remaining issues are:

  1. the "indel" intersections contain variants over 50bp
  2. When running "sv" intersects, they only include variants found in "A," and "A,B". No variants are included from ",B"