Closed aaronmussig closed 1 year ago
Hi @aaronmussig , thanks for using Pyrodigal!
I'll have a look at it. I think the bug is coming from region masking (-m
), since I'm not seeing a difference in predicted genes when testing with region masking disabled. Note that comparing the MD5 sums is not the proper way to check result consistency because the headers may be different, because of rounding issues or this kind of thing. You can use the comparison repository to make sure the gene coordinates match exactly between the two.
Okay, I think found the discrepancy between Pyrodigal and Prodigal: Prodigal only masks regions of 50 N
or more, whereas Prodigal masks any N
when masking is enabled. I'll make a patch and see if that addresses the problem.
$ python compare.py --prodigal prodigal --genome GCA_900004415.fna -c -m
Genomes closed=True masked=True
Hits genome=GCA_900004415: prodigal=6693, pyrodigal=6693, equal=True
Looks like that was it, I'll make a new release.
Fixed in v2.0.3
.
You're quick! Thanks for looking into this
I'd rather not leave this kind of bugs around before going on holidays, ahah :wink:
Hello,
I just wanted to say I appreciate the port of Prodigal, it's great work!
There is a case where running GCA_900004415.1 gives different results when running
Pyrodigal-2.0.2
compared toProdigal-2.6.3
andProdigal-2.6.3+31b300a
.Notably, there are quite a few differences where the
gc_cont
value differs slightly, there are also a few cases where genes differ (content and quantity).Below are the commands that I ran (output attached):