Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
132 stars 114 forks source link

StructuralVariantOverlap skips all INV matches when same_type=1 used #710

Open davmlaw opened 3 months ago

davmlaw commented 3 months ago

When using same_type=1 all "INV" records (and other SO types) are skipped on the following line:

      next if $olap_sv->class_SO_term() ne $svf->class_SO_term() &&  $self->{same_type}  ==1;

The test is "INV" ne "inversion" which fails - this is because there is very similar VCF parsing code in VEP and Ensembl Variation tools that doesn't quite convert terms identically:

"INV"

$olap_sv->class_SO_term(); 

This returns "INV" as the code runs through

            # convert to SO term
            my %terms = (
                INS  => 'insertion',
                DEL  => 'deletion',
                TDUP => 'tandem_duplication',
                DUP  => 'duplication'
            );

"inversion"

$svf->class_SO_term();

This returns "inversion" as the code does the conversion:

  my %terms = (
    INS       => 'insertion',
    INS_ME    => 'mobile_element_insertion',
    INS_ALU   => 'Alu_insertion',
    INS_HERV  => 'HERV_insertion',
    INS_LINE1 => 'LINE1_insertion',
    INS_SVA   => 'SVA_insertion',

    DEL       => 'deletion',
    DEL_ME    => 'mobile_element_deletion',
    DEL_ALU   => 'Alu_deletion',
    DEL_HERV  => 'HERV_deletion',
    DEL_LINE1 => 'LINE1_deletion',
    DEL_SVA   => 'SVA_deletion',

    TDUP => 'tandem_duplication',
    DUP  => 'duplication',
    CNV  => 'copy_number_variation',
    INV  => 'inversion',
    BND  => 'chromosome_breakpoint'
  );

Test data

For test purposes, grep a <INV> record from the gnomad_v2.1_sv.sites VCF - so you know it should get an overlap. I've attached a VCF that should def get an overlap

test_grch37_symbolic_alt.vcf.gz

nakib103 commented 3 months ago

Hello @davmlaw,

Thanks for flagging this issue! I can re-produce the issue and will add a fix hopefully along with the next release. As alternative, for now, you can try using --custom option with same_type=1, which is working.

Best regards, Nakib