Closed shenwei356 closed 6 months ago
Hi @shenwei356 ,
Thanks for picking this bug up.
Pharokka needs a refactor now that I am a much better coder than I was when I first wrote it, I just need to find some time!
This bug seems to be caused by bad typing when parsing the gene prediction summary file.
I can reproduce the error in v1.7.0 for scientific notation and integers that have leading 0s - these were being parsed as Int not as str.
I've put in a fix in v1.7.1 that parses everything always as str and tested it locally - on the dev branch if you are keen, otherwise it should be available once all the CI checks have passed.
George
Description
Hi, all. This is a very interesting bug, I can reproduce this. Input fasta file contains a single sequence record, the ID is
01E2
.Command:
The genbank file looks like below. It's split into two records, one with only sequence , another one with only annotations, haha.
OK, if I rename the FASTA ID with some string not starting with
0
, everything is right.Possible reason
01E2
might be parsed as a scientific notation (100), because the second genbank record isIf I change the ID to
0102
, it panics again, with the second genbank record asSo a note should be shown to users, do not use sequence IDs that look like a number, including scientific notation.