Closed FlorianTrigodet closed 3 months ago
Hi @FlorianTrigodet!
Differences between prodigal-gv and Prodigal are due to two main factors: (1) a couple of bugfixes from @althonos, some of which were not incorporated into vanilla Prodigal (https://github.com/apcamargo/prodigal-gv/commit/745d3e8e366da3339c8aa06e73f57116d8c8d617, https://github.com/apcamargo/prodigal-gv/commit/d71a02eda26b29eb79f3ca62979ece126375b7ef, https://github.com/apcamargo/prodigal-gv/commit/1f891d67f6d69360e0310ac5c3977ad8d63c1930, https://github.com/apcamargo/prodigal-gv/commit/ba4b7dbdde8bde2ca1df2f3e2e7c632336d23609); (2) additional gene models in the metagenome mode, some of which use translation table 15.
Because of (1), Prodigal and pyrodigal-gv can give you distinct gene calls even when they use the same gene model in the metagenome mode, but the differences should be very small. Can you check if Prodigal and prodigal-gv picked the same model? This is easy to get from the GFF output.
A more 1:1 comparison would be to compare pyrodigal and pyrodigal-gv, since pyrodigal incorporates all the fixes and the only difference between the two software is that pyrodigal-gv includes the additional gene models. On top of that, pyrodigal/pyrodigal-gv are faster than Prodigal/prodigal-gv.
p.s.: is there a reason for the starting position being constant in your table?
Hi @apcamargo!
Thanks a lot for the detailed response, really appreciate! I only investigated contigs where prodigal and prodigal-gv picked the same model, and that's why I was concerned about similar, yet slightly different output.
I just read about all the issues and fixes in pyrodigal/pyrodigal-gv and it looks like the difference I was seeing is due to the SD or RBS detection/scoring issue in prodigal. I will continue with pyrodigal/pyrodigal-gv for now!
And as for the table with the constant start position: it is from the output of -s
for all possible genes.
Thanks!
Ohh, I don't think I've ever used -s
. This is very useful!
Please let me know if you need anything else!
Hello!
I was looking into replace prodigal with prodigal-gv in my routine workflows (and maybe change the default gene caller in the platform anvi'o), so I ran some test to investigate the potential differences with prodigal.
I used a small metagenome available in this tutorial and extracted the genes calls not identical between the two programs. I used
-p meta
for both prodigal and prodigal-gv.Around 5-6% of the total gene calls were not quite identical between prodigal and prodigal-gv, with a noticeable different at the start position (or stop if gene is reverse). I am focusing on results where the model and genetic code are comparable between prodigal and prodigal-gv.
Here is a random example (program 'og' is original prodigal, 'gv' is prodigal-gv):
And here is the detailed output of each program for this region:
In bold are the selected hit. I can see that both programs compute different scores, especially regarding the Shine-Dalgarno sequence and the ribosome binding site. But I am not sure why the selected gene-call is not the one with the highest score.
Do you have more information about that change in scoring system between prodigal and gv? And why the shorter gene call would be the best in this case?
Thanks for your response!