chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
603 stars 243 forks source link

write Dbxref instead of db_xref when converting genbank to GFF3 #102

Open timflutre opened 8 years ago

timflutre commented 8 years ago

I am trying to convert a Genbank file to GFF3 following the latest version of the official specification. Here is an example of Genbank file I need to convert: ftp://ftp.ncbi.nlm.nih.gov/genomes/Vitis_vinifera/ARCHIVE/BUILD.1.1/CHR_01/vvi_ref_chr1.gbs.gz

The script genbank_to_gff.py works but write db_xref instead of Dbxref. Same for note instead of Note. I also have other issues, e.g. exon being encoded as "feature mRNA", etc.

I can see that you often advise people to look at gffutils. But it doesn't handle the Genbank format. So should I start looking at your BCBio code? Is there any chance to include it at some point in Biopython?

chapmanb commented 8 years ago

Timothée; Thanks for the report and sorry to be slow in getting back with you. This script only handles formatting conversion -- taking what is in the GenBank file and converting over to GFF format. It doesn't try to do the work of massaging naming of attributes to match between the two. The code really only uses the internal Biopython representation to do the conversion between a shared object, but doesn't have any special cases. I'd be happy to accept a pull request to do that, or it's something you could ask about supporting at gffutils if you'd like a more forward-looking approach. gffutils has some support for Biopython interoperability now, although would also need work to handle these specific naming conversion cases as well.

Sorry to not have a ready solution for you but hope this helps.

timflutre commented 8 years ago

Ok, thanks for getting back to me.

lcscs12345 commented 8 years ago

Hi Timothée, have you tried annotwriter?

timflutre commented 8 years ago

Thanks @lcscs12345 , I'll have a look!