bioinfo-chru-strasbourg / howard

Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery
GNU Affero General Public License v3.0
6 stars 2 forks source link

howard convert error #246

Closed JbaptisteLam closed 1 month ago

JbaptisteLam commented 3 months ago

Hi,

cmd: howard convert --explode_infos_prefix "INFO/*" --explode_infos --input --output

I had an error while trying to convert a vcf to a tsv. Here is the error:

File "/home1/data/conda/env/ngs_jb/bin/howard", line 33, in sys.exit(load_entry_point('howard', 'console_scripts', 'howard')()) File "/home1/BAS/lamouchj/howard/howard/main.py", line 273, in main eval(f"{command_function}(args)") File "", line 1, in File "/home1/BAS/lamouchj/howard/howard/tools/convert.py", line 66, in convert vcfdata_obj.load_data() File "/home1/BAS/lamouchj/howard/howard/objects/variants.py", line 1307, in load_data self.explode_infos( File "/home1/BAS/lamouchj/howard/howard/objects/variants.py", line 1659, in explode_infos fields = self.get_explode_infos_fields(explode_infos_fields=fields) File "/home1/BAS/lamouchj/howard/howard/objects/variants.py", line 1392, in get_explode_infos_fields r = re.compile(field) File "/home1/data/conda/env/ngs_jb/lib/python3.10/re.py", line 251, in compile return _compile(pattern, flags) File "/home1/data/conda/env/ngs_jb/lib/python3.10/re.py", line 303, in _compile p = sre_compile.compile(pattern, flags) File "/home1/data/conda/env/ngs_jb/lib/python3.10/sre_compile.py", line 788, in compile p = sre_parse.parse(p, flags) File "/home1/data/conda/env/ngs_jb/lib/python3.10/sre_parse.py", line 955, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "/home1/data/conda/env/ngs_jb/lib/python3.10/sre_parse.py", line 444, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "/home1/data/conda/env/ngs_jb/lib/python3.10/sre_parse.py", line 672, in _parse raise source.error("multiple repeat", re.error: multiple repeat at position 5

After going more into the details, it appear that "GERP++_NR" field raise the error, probably because the "re" python library try to interpret the "+" sign.

Best,

Jean-Baptiste

antonylebechec commented 3 months ago

This is probably because VCF do not accept some characters, like "+". Change name of field/column before convert.

Can you provide a input file example?