Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

Update delimiter for FILTER when used for custom option #1689

Closed nakib103 closed 3 weeks ago

nakib103 commented 4 weeks ago

ENSVAR-6276 https://github.com/Ensembl/ensembl-vep/issues/1646

When we use FILTER field from a VCF file in --custom argument, the delimiter for FILTER, which is ;, is replaced by , (see). The reason is ; have specific meaning in certain file type - for example, in VCF where this FILTER is going to put under INFO,; is used as delimiter to split up different fields in it.

But we also replace any , with & if found within any specific value under CSQ INFO field (see). That is because in CSQ field , is used as delimiter to separate different records.

And thus a ; can become a & in VCF CSQ field. Which causes the confusion mentioned in the issue above.

To avoid error replace ; with %3B which we are already doing for VCF file (see).

Cons: Other output format (tab, vep text) will now have %3B instead of , as delimiter. To avoid that, according to @nuno-agostinho's suggestion utilized convert_arrayref function to keep the ; and let the OutputFactory method handle the delimiter substitution. Default VEP format needed to replace ; by %3B too.