Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
Apache License 2.0
437 stars 150 forks source link

Update delimiter for FILTER when used for custom option #1689

Closed nakib103 closed 3 weeks ago

nakib103 commented 4 weeks ago

ENSVAR-6276 https://github.com/Ensembl/ensembl-vep/issues/1646

When we use FILTER field from a VCF file in --custom argument, the delimiter for FILTER, which is ;, is replaced by , (see). The reason is ; have specific meaning in certain file type - for example, in VCF where this FILTER is going to put under INFO,; is used as delimiter to split up different fields in it.

But we also replace any , with & if found within any specific value under CSQ INFO field (see). That is because in CSQ field , is used as delimiter to separate different records.

And thus a ; can become a & in VCF CSQ field. Which causes the confusion mentioned in the issue above.

To avoid error replace ; with %3B which we are already doing for VCF file (see).

Cons: Other output format (tab, vep text) will now have %3B instead of , as delimiter. To avoid that, according to @nuno-agostinho's suggestion utilized convert_arrayref function to keep the ; and let the OutputFactory method handle the delimiter substitution. Default VEP format needed to replace ; by %3B too.