althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
138 stars 5 forks source link

GFF seqid may violate gff3 spec #18

Closed zdk123 closed 1 year ago

zdk123 commented 1 year ago

Thanks for this tool! One comment and a request:

It looks like the GFF output format here is slightly different than prodigal, and (maybe) the gff3 specs. This is preventing us using pyrodigal as a drop-in replacement.

Prodigal uses the input contig name for the seqid, while pyrodigal uses prefix/gene_{i} , which seems to violate this spec:

Column 1: "seqid" The ID of the landmark used to establish the coordinate system for the current feature...

The landmark for the gene coordinates obviously cannot be the gene id itself.

What would be great is if instead the prefix could modify the ID component of the _gene_data which would add an enhancement over the gff output provided by prodigal without violating the spec.

althonos commented 1 year ago

Oh yes, this is a mistake on my end, sorry :zipper_mouth_face: I'll update this part for v1.2.

althonos commented 1 year ago

I'm pushing a pre-release that you can test on your side (v2.0.0-rc.1). This will need a bump in major version since I ended up changing the signature of all write methods. I'll try to fix #19 before releasing the actual v2.

zdk123 commented 1 year ago

Wow that was fast, thank you I'll test this out soon!

althonos commented 1 year ago

Marking this as fixed since v2.0.0 will now use the sequence identifier in the first column and create a gene identifier for the ID attribute.