Closed sci-kai closed 1 year ago
Based on the status report meeting from April, an alternative approach to generate a TSV output file using "vembrane" (https://github.com/vembrane/vembrane) was discussed.
Initial local tests in Aachen with the software package showed no silent dropping of mutations without consequence information using vembrane, a problem linked to vep-split. To integrate vembrane, a few adaptions would need to be done to the workflow:
The VEP annotation module would be extended by the option --vcf_info_field ANN for downwards compatibility with vembrane suite and to easily extract information previously annotated in the INFO-field as CSQ
The workflow would need to be extended by bcftools norm, which splits mutational events with multiple alternative alleles into separate mutational events (handling of DBS events). The documentation of vembrane recommends the normalization prior to annotation. Based on further discussions from the meeting, we would suppress the --atomize step in the normalization step as it seem to break file formatting.
Furthermore, using bcftools index as additional step decreases the runtime and could provide index files for the vcf files for other downstream processes.
Description of feature
The annotated output from VEP should be converted to a format more easy to process for spreadsheet programs and programming languages as python and R. A common standard is a TSV file format. This should be implemented as separate module. BCFtools already has a plugin for splitting VEP-annotated output "+split-vep" https://samtools.github.io/bcftools/howtos/plugin.split-vep.html. It should output all columns by default.