Closed oschwengers closed 1 year ago
Hi Oliver, thanks a lot for reaching out, I'm happy to hear you're considering Pyrodigal for super useful tools like these! I've been advocating to Bakta for some time :wink:
I'm very thankful for the test set and the gene prediction comparison analysis, to be fair this is something I should have done myself earlier. I'm not yet entirely sure whether it's related to #19 or not because #19 actually affects the already computed genes by recording the wrong scores for display, but it shouldn't change the prediction of the genes themselves. Nevertheless I'll have a look at these genomes, it could be that in the v1 refactor I did some mistakes while adapting the original code.
Just so that I can reproduce, is this in meta or single mode? I also recently found out that Prodigal was doing some out-of-bound reads of sequence data while training, so I'll see about patching that, but I don't think that this is causing the discrepancies you've been seeing.
Just found the comparison repository, thanks @jhahnfeld for setting this up!
Hi Martin,
I just wanted to provide you with the link to the comparison repository, but you already found it :) It now has a results folder with the current differences.
Just so that I can reproduce, is this in meta or single mode?
We used the single mode.
I just tested the comparison by using Pyrodigal where instead of training I loaded the training file from Prodigal: the results matched 100%. So it looks like Pyrodigal has some discrepancies in the training process, I'll have a look ASAP!
I updated the prediction mismatches and removed the '-c' flag since the tested genomes are closed. Now there are some more mismatches and completely missing predictions.
I just want to +1 this effort. Finding a few [minor] differences in our tests as well in meta mode.
@zdk123 : If you have some example sequences with inconsistencies in meta mode please share them here! I have identified a bug in the training process that I am working on fixing, but meta mode shouldn't be concerned, so if you see discrepancies there too that another bug that i have to fix :sweat_smile:
Just shipped pre-release 2.0.0-rc.2
, which seems to have fixed the issues on closed=True
, where I'm not seeing any mismatches anymore. Problem was coming from a weird behaviour of Prodigal when in training mode that I didn't replicate exactly in my port of the score_connection
function. (I'm actually thinking that's a bug in the original codebase, where a maximum search algorithm is not actually recording the maximum when in training mode).
Still seeing some mismatches in closed=False
however, so I'll keep chasing bugs :smiling_face_with_tear:
@althonos Actually I think this may have been the same as issue #19 v2-rc2 seems to have fixed
Okay, found and patched a new bug (#22) which fixes the remaining issues on closed=False
. On my machine the new v2.0.0-rc.3
release passes all comparison tests from @jhahnfeld's repository with both closed=True
and closed=False
. Please give it a try, if all is well I'll make a proper v2.0.0 after adding a bit of documentation and some minor fixes!
Thanks a lot for this. We'll have a look at this ASAP.
Just out of curiosity: I'm wondering if this update implies larger code changes that would qualify for a major version bump to 2.0.0
? IMHO a minor bump (1.2.0
) would be sufficient, too if there are no breaking API changes. This might also help to keep in sync with a potentially upcoming Prodigal 2.0
version.
@oschwengers : There are breaking changes coming from #18 because I'm going to add a mandatory argument to all Genes.write_*
methods. In addition I'll also probably change some attributes of TrainingInfo
that return copies to return views of the underlying buffers. That's why I'm trying to synchronize the release a bit this time, so that I can put all the API breaking changes in there at once.
In general choosing the versioning scheme is always complicated for bindings, but I think in here I'm just going to let Pyrodigal have its own versioning scheme following semver :) And actually Prodigal is at version 2.6.3
, so this leaves the door open for Prodigal 3.0
.
Ahh, I see. OK, that makes totally sense. Thanks!
Hi, thank you for looking into the issues.
I have tested Pyrodigal v2.0.0-rc.3 and still have two mismatches in GCF_000008865 with closed=False
. For closed=True
, I am getting new mismatches now that were not reported before (I have also added the report of mismatched stop codons.).
As for the potential bug in the score_connection
function, it might be good to report it upstream to check if the behavior is intended or not.
@jhahnfeld : I'm not seeing any mismatch in closed=True
anymore on my end. Are you deleting the tmp
folder between runs in closed=False
and closed=True
modes? If you don't, Prodigal will not perform the training if the training file already exists, so you will end up using the wrong training file. See PR : https://github.com/jhahnfeld/prodigal-pyrodigal-comparison/pull/1
Sorry, I forgot to delete the tmp
folder between the runs. :sweat:
Now there are no more mismatches in the comparison test.
P.S. Thanks for the PR
Now, that we all see the same gene predictions between Prodigal and Pyrodigal, I'll close this issue. Just in case I've missed something, please do not hesitate to re-open it.
Great effort and results! Thanks a lot everyone! Can't wait to use Pyrodigal in our projects...
Thanks a lot for the extensive bug report and the comparison repository!
Hi and thanks so much for working on this. A patched, accelerated and potentially multi-threaded Prodigal alternative is very much needed and your effort very much appreciated!
We considered
Pyrodigal
as a promising alternative forProdigal
within Bakta and Platon. To add an extra layer of confidence we started a gene prediction comparison on 49 bacterial genomes. The large majority of all genes are equally predicted but unfortunately, some prediction slightly differ and some genes are not predicted byPyrodigal
at all.We waited to double check everything but as I saw your issue https://github.com/althonos/pyrodigal/issues/19 today, I thought this could maybe help to further debug it. @jhahnfeld has conducted the comparison and could provide more info on this.
Here are the results using a locally compiled and patched
Prodigal
version: