chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
610 stars 243 forks source link

Start position in GFF #55

Closed srikarchamala closed 12 years ago

srikarchamala commented 12 years ago

Hi Brad,

Why is the start location after parsing (*.location) gff file is one less than the original.

If GFF file have below co-ordinaties,

five_prime_UTR 3860074 3861033

The parser is outputting 3860073 3861033 as start and end coordinates. I read through the SO ontology and other documentation. What I found is start position is 1 based coordinate system and I suppose it is inclusive.

Pardon me if this is a silly questions.

Thanks, Srikar.

chapmanb commented 12 years ago

Srikar; In the internal python data structures, coordinates are converted to 0-based to maintain consistency with Biopython (and Python itself). When writing out they will be converted back to 1-based to be correct GFF.

Hope this helps. Let me know if you run into any other problems.

srikarchamala commented 12 years ago

Hi Brad,

Thanks for the clarification. I am writing a script to convert GFF3 format to GTF. What I did was added 1 before converting to GTF format.

One more question I have is, can I parse fasta sequences of only CDS regions or any other features like exon, 5'UTR or so forth? If so what method in this is helpful.

Thanks, Srikar.

On Wed, Mar 21, 2012 at 8:06 PM, Brad Chapman < reply@reply.github.com

wrote:

Srikar; In the internal python data structures, coordinates are converted to 0-based to maintain consistency with Biopython (and Python itself). When writing out they will be converted back to 1-based to be correct GFF.

Hope this helps. Let me know if you run into any other problems.


Reply to this email directly or view it on GitHub: https://github.com/chapmanb/bcbb/issues/55#issuecomment-4630050

chapmanb commented 12 years ago

Srikar; The Biopython SeqIO interface provides a nice way to parse FASTA files:

http://biopython.org/wiki/SeqIO

you can then use standard Python slicing to subset these to exons, CDSs or other features of interest. Hope this helps.

srikarchamala commented 12 years ago

Thanks a lot!

Srikar.

On Thu, Mar 22, 2012 at 11:34 AM, Brad Chapman < reply@reply.github.com

wrote:

Srikar; The Biopython SeqIO interface provides a nice way to parse FASTA files:

http://biopython.org/wiki/SeqIO

you can then use standard Python slicing to subset these to exons, CDSs or other features of interest. Hope this helps.


Reply to this email directly or view it on GitHub: https://github.com/chapmanb/bcbb/issues/55#issuecomment-4640801