Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
Apache License 2.0
437 stars 150 forks source link

Fix warning when using SV in VCF format with 5 fields #1700

Closed nuno-agostinho closed 6 days ago

nuno-agostinho commented 2 weeks ago

Currently, VEP allows to use VCF input with only the first 5 fields, leaving the remaining 3 mandatory fields as NULL.

However, for SVs, VEP tries to do a string comparison on the INFO field, resulting in the warnings:

Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 340.
Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 359.
Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 378.
Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 397.

The solution is to simply complete SV information by appending . to the 3 last mandatory fields.


The results should be exactly the same (but should not show a warning) for all formats regardless of using 5 or 8 fields in VCF input:

vep --id "chr22 29767384 . G [1:109650635[GG,[2:9650635[TT" --database --force --vcf
vep --id "chr22 29767384 . G [1:109650635[GG,[2:9650635[TT . . ." --database --force --vcf