althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
138 stars 5 forks source link

Inconsistent gene calls between Pyrodigal-2.0.2 and Prodigal-2.6.3+31b300a #26

Closed aaronmussig closed 1 year ago

aaronmussig commented 1 year ago

Hello,

I just wanted to say I appreciate the port of Prodigal, it's great work!

There is a case where running GCA_900004415.1 gives different results when running Pyrodigal-2.0.2 compared to Prodigal-2.6.3 and Prodigal-2.6.3+31b300a.

Notably, there are quite a few differences where the gc_cont value differs slightly, there are also a few cases where genes differ (content and quantity).

Below are the commands that I ran (output attached):

# Set the environment

mkdir -p /tmp/prodigal
ls /tmp/prodigal
# GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna

# ---------------------------------------------------

# Create the prodigal-2.6.3 environment
mamba create -n prodigal-2.6.3
mamba activate prodigal-2.6.3
mamba install -c bioconda prodigal==2.6.3

# Run Prodigal
mkdir -p /tmp/prodigal/prodigal-2.6.3
cd /tmp/prodigal/prodigal-2.6.3
prodigal -i /tmp/prodigal/GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna -m -g 11 -o features.gff -a amino.faa -d nuc.fna

# Get checksums
md5sum *
# f1abac1dbe74388093e56550625d943a  amino.faa
# 44be68f0f21f9db116b853a162800b6c  features.gff
# 587802c6eae01eabd23d27d9c6f72f59  nuc.fna

# ---------------------------------------------------

# Create the prodigal-2.6.3 environment (with patch)
mamba create -n prodigal-2.6.3+31b300a
mamba activate prodigal-2.6.3+31b300a

# Fetch the Git repository
mkdir -p ~/mambaforge/envs/prodigal-2.6.3+31b300a/share/prodigal-2.6.3+31b300a
cd ~/mambaforge/envs/prodigal-2.6.3+31b300a/share/prodigal-2.6.3+31b300a
git clone https://github.com/hyattpd/Prodigal.git
cd Prodigal
git rev-parse HEAD 
# 31b300a99a39964893057128ea10338e9a26bd6c

# Make the install
mkdir -p ~/mambaforge/envs/prodigal-2.6.3+31b300a/bin
make install INSTALLDIR=~/mambaforge/envs/prodigal-2.6.3+31b300a/bin

# Run Prodigal
mkdir -p /tmp/prodigal/prodigal-2.6.3+31b300a
cd /tmp/prodigal/prodigal-2.6.3+31b300a
prodigal -i /tmp/prodigal/GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna -m -g 11 -o features.gff -a amino.faa -d nuc.fna

# Get checksums
md5sum *
# f1abac1dbe74388093e56550625d943a  amino.faa
# 44be68f0f21f9db116b853a162800b6c  features.gff
# 587802c6eae01eabd23d27d9c6f72f59  nuc.fna

# ---------------------------------------------------

# Create the pyrodigal environment
mamba create -n pyrodigal-2.0.2
mamba activate pyrodigal-2.0.2
mamba install -c bioconda pyrodigal==2.0.2

# Run Prodigal
mkdir -p /tmp/prodigal/pyrodigal
cd /tmp/prodigal/pyrodigal
pyrodigal -i /tmp/prodigal/GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna -m -g 11 -o features.gff -a amino.faa -d nuc.fna

# Get checksums
md5sum *
# 6fd912cdf58f2442c11ddfb810e20c4e  amino.faa
# b3b1694dd07dd9b94cede4d446cafcfb  features.gff
# bfc670a857f6fe5a138dd356f5b4e424  nuc.fna
althonos commented 1 year ago

Hi @aaronmussig , thanks for using Pyrodigal!

I'll have a look at it. I think the bug is coming from region masking (-m), since I'm not seeing a difference in predicted genes when testing with region masking disabled. Note that comparing the MD5 sums is not the proper way to check result consistency because the headers may be different, because of rounding issues or this kind of thing. You can use the comparison repository to make sure the gene coordinates match exactly between the two.

althonos commented 1 year ago

Okay, I think found the discrepancy between Pyrodigal and Prodigal: Prodigal only masks regions of 50 N or more, whereas Prodigal masks any N when masking is enabled. I'll make a patch and see if that addresses the problem.

althonos commented 1 year ago
$ python compare.py --prodigal prodigal --genome GCA_900004415.fna -c -m
Genomes closed=True masked=True
Hits genome=GCA_900004415: prodigal=6693, pyrodigal=6693, equal=True

Looks like that was it, I'll make a new release.

althonos commented 1 year ago

Fixed in v2.0.3.

aaronmussig commented 1 year ago

You're quick! Thanks for looking into this

althonos commented 1 year ago

I'd rather not leave this kind of bugs around before going on holidays, ahah :wink: