Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

Fix warning when using SV in VCF format with 5 fields #1700

Closed nuno-agostinho closed 6 days ago

nuno-agostinho commented 2 weeks ago

Currently, VEP allows to use VCF input with only the first 5 fields, leaving the remaining 3 mandatory fields as NULL.

However, for SVs, VEP tries to do a string comparison on the INFO field, resulting in the warnings:

Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 340.
Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 359.
Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 378.
Use of uninitialized value in pattern match (m//) at /hps/software/users/ensembl/variation/nuno/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 397.

The solution is to simply complete SV information by appending . to the 3 last mandatory fields.

Testing

The results should be exactly the same (but should not show a warning) for all formats regardless of using 5 or 8 fields in VCF input:

vep --id "chr22 29767384 . G [1:109650635[GG,[2:9650635[TT" --database --force --vcf
vep --id "chr22 29767384 . G [1:109650635[GG,[2:9650635[TT . . ." --database --force --vcf