chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
604 stars 243 forks source link

CDS Phase not calculated by GFFOutput #87

Closed bobbyo closed 9 years ago

bobbyo commented 10 years ago

The GFF3 standard (v1.21) requires that the phase be set for type CDS lines, however GFFOutput does not seem to calculate phase, rather all phases are reported as 0.

I wrote routines to take a list of SeqFeatures and calculate/set their phases. (E.g. sort list 5-prime to 3-prime, default the first CDS to phase=0 if no phase is set, and then set the phases of all CDSes in the list. CDS SeqFeatures processed in this way show the correct phase in gff3 files written by GFFOutput.

Would this code be of interest for incorporating into GFFOutput.py ?

chapmanb commented 10 years ago

Bobby; Thanks for this, I'd be happy to include utilities to set phases if not present. The current output is meant as a converter so doesn't have a lot of calculation work built in, but that is only out of need rather than a design principle.

Also, it's worth looking at Ryan Dale's gffutils if you haven't already:

https://github.com/daler/gffutils

It might already do some of this and save you time.

Thanks again for the interest and offer to help.