chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
599 stars 244 forks source link

Convert GFF file to Sequin TBL file #101

Closed sjackman closed 5 years ago

sjackman commented 8 years ago

Submitting to GenBank requires converting a GFF file to a Sequin TBL file, which is then converted ASN.1 using tbl2asn. I have searched, and I have not found a good (or any, really) converter from GFF to Sequin TBL. Would you be interested in adding such a tool? Here's the hacky script that I cobbled together for this purpose: gff3-to-tbl. It's not general purpose, but could be a useful starting point.

chapmanb commented 8 years ago

Shaun; Definitely, happy to pull this into the gff scripts directory if you want to submit a pull request. Thanks for making it available.

For a suggestion that requires more work, have you thought about migrating it to gffutils (https://github.com/daler/gffutils) and pushing it upstream there? I've generally been pointing people at that rather than this implementation. Ryan did an awesome job with it and it's more scalable than my implementation.

sjackman commented 8 years ago

@daler Would you like a GFF to Sequin TBL converter in gffutils? See above.

sjackman commented 8 years ago

This task isn't a regular task of mine, so the maintenance burden would largely fall on the shoulder's of whomever adopted it. I'd love for this script to find a permanent home though. It's really odd that a good GFF to Sequin TBL converter doesn't already exist, but I wasn't able to find one.

daler commented 8 years ago

Hi Shaun -

It looks like it can be readily ported to gffutils. I've been meaning to have a collection of conversion scripts (like creating refFlat files from GTF or GFF files), but don't have anything started yet. A gffutils.conversion module would be a good home for this GFF to TBL script.

I'm unfamiliar with the TBL format though. If you could submit a pull request to gffutils with some example input/output, some basic functional tests, and the existing script, I can do the porting to gffutils.

sjackman commented 8 years ago

Hi, Ryan. Here are two genomes that I annotated in GFF and converted to TBL format using this script. They have some fun trans-splicing and RNA editing exceptions.

  1. GFF and TBL
  2. GFF and TBL
sjackman commented 8 years ago

Here's a description of the Sequin TBL format:

  1. http://www.ncbi.nlm.nih.gov/projects/Sequin/table.html
  2. http://www.ncbi.nlm.nih.gov/genbank/genomesubmit_annotation
sjackman commented 5 years ago

NCBI now accepts submissions to GenBank in GFF3 format, so the original use case for this request exists no longer. See https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/

chapmanb commented 5 years ago

Shaun; Thanks for following up. I've been pointing everyone at gffutils for GFF work since I haven't worked on this in a while, but glad this just magically went away 3 years later.

sjackman commented 5 years ago

Motivational poster:

Procrastination

Why do today that which will be unnecessary tomorrow?

SLAment commented 5 years ago

I should add that your script @sjackman is still useful in case you happen to have a small gff for small contigs that you want to submit through BankIt. In that case, as far as I can tell, you still need the tbl file.

sjackman commented 5 years ago

My workflow is still converting my GFF files to TBL using this script, and using the normal tbl2asn rather than the GFF version.