genomeannotation / GAG

Generates an NCBI .tbl file of annotations on a genome.
MIT License
64 stars 20 forks source link

Incorporate fixes into dev branch #168

Closed kylecannoles closed 7 years ago

kylecannoles commented 7 years ago

Fixes for #127, #164, and #166 are included.

Issue #127: The issue appears to be that the ID= and Name= fields for a gene can't be the same otherwise tbl2asn spits out a SEQ_FEAT.LocusTagProblem error. The gene name should be different in an annotation and not simply the ID/locus_tag replicated. The gene name will generally be a human readable name that is assigned while the locus_tag will be a unique identifier of the gene. To fix this issue, we can delete the 'name' key from the results dictionary in gff_reader.py if the values for ID and Name are the same. ID and Name in a gff file becomes locus_tag and gene respectively in the resulting .tbl file. Fixed in 3b3a2e7 and 69b73e9.

Issue #164: While reading in a gff3/.gff file, it appears that Dbxref is a standard attribute which makes a reference to an external database. This attribute should be written as db_xref in the resulting .tbl file. My first attempt to fix this was in e8755c1 and 298055f. I discovered that modifying the dictionary key to be db_xref instead of the prior Dbxref creates a resulting .gff file where Dbxref is also converted to db_xref which is incorrect. This was reverted in e8bd510 and 1303ca5. I then removed a test I had created as part of commit 3b3a2e7, in commit 4791914. I then added the proper test and fix in 0926cfa and 73b820e respectively.

Issue #166: Upon review, protein translation for the - strand was fixed with d80f3fa and reverted with 203ce26 and 98da78e. However, the start and stop codons were not being properly found and I applied a quick fix by using the return value from the opposite functions get_start_indices<->get_stop_indices in cds.py. This was done by copying and pasting the branch of the if statement for the + strand to the branch on the - strand for both functions. More information in commit message c284dc8.

After fixing these issues, I added the ability to run the gag.py script with either the -v or --version flags and added a version variable to gag.py so that users can view the version number for gag. We can update this number when we tag releases as well.

nextgenusfs commented 7 years ago

This looks great @kylecannoles, hope these fixes can get approved and released shortly!