Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
150 stars 46 forks source link

SV variants and large SNVs collide in variant_id #4682

Closed dnil closed 3 months ago

dnil commented 3 months ago

Describe the bug While normally describing SVs with position and type, e.g. chr1_123456_T_DUP or such, an SV caller (manta) may elect to report small variants, and if the variant is small enough, even represent the full allele sequence, e.g. chr1_123456_T_TT. These variants typically also occur in the SNV/INDEL file, giving the same variant id. Depending on what file is parsed first, the variant will only appear in that type variant list. This is normally not a problem, but sometimes an institute may elect not to analyse variants of a certain type, e.g. only SNVs, not SVs, in which case the variant could be missed.

To Reproduce

  1. Load a case with some collisions, e.g. engaginggrouse. Note how there are ID warnings:
    2024-06-19 09:32:06 hasta.scilifelab.se scout.adapter.mongo.variant_loader[12314] WARNING Variant 900f6c92533766eabeb10a8573297f93 already exists in database - modifying
    2024-06-19 09:32:06 hasta.scilifelab.se scout.adapter.mongo.variant_loader[12314] WARNING Variant 8fb3f02b57613d36aefa0b690eeb8e8f already exists in database - modifying
  2. Note how the variant is now only shown on the variant type page that was parsed first, here the cancer_sv.

Expected behavior We would (arguably, this is non-obvious) still like to see the duplicated variants on both type views. A workaround might be to ensure SVs are loaded last and modify colliding small SV calls to be of the less informative, big SV type. We should carefully check that the matching causatives behaviour is still acceptable. The variant could now be tallied in two different views, which might get a bit complex.

Additional context Manta, which is the caller we currently primarily see with this behaviour, does not seem to have an option to force long-sv style entries.