genomeannotation / GAG

Generates an NCBI .tbl file of annotations on a genome.
MIT License
64 stars 20 forks source link

Printing gene names in .tbl files when missing protein annotation #127

Closed bruab closed 7 years ago

bruab commented 10 years ago

As mentioned here

https://groups.google.com/forum/#!msg/gag_support/pkgqvtA6OQY/i4bFISi0dfoJ

If the input GFF has a "Name=" field then it automatically gets a '\t\t\tgene\t\n' line in the .tbl. This is just what we want if the gene name was provided as part of a genuine annotation, but if they're just there (as they seem to be in a number of GFFS out in the wild), and if they just duplicate the content of the "ID=" field, tbl2asn throws an ERROR.

So, (a) state explicitly in the docs that you should only include a "Name=" if you're annotating. Ok, will do.

But while we're out it, should we (b) check genes at write time and see if they contain a protein-annotated feature, leaving out the gene name if we don't find one?

tedsta commented 10 years ago

Yeah, and we could also make it a fix maybe?

fix gene_names_with_no_product

bruab commented 10 years ago

This I like. Way to stick with the protocol :100: