chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
604 stars 243 forks source link

GFF.write fails when using a single SeqRecord. #51

Closed mercutio22 closed 12 years ago

mercutio22 commented 12 years ago

In [6]: seqTP53

Out[6]: SeqRecord(seq=Seq('TGGTTCAAGTAATTCTCCTGCCTCAGACTCCAGAGTAGCTGGGATTACAGGCGC...CCC', IUPACAmbiguousDNA()), id='NG_017013.1', name='NG_017013', description='Homo sapiens tumor protein p53 (TP53), RefSeqGene on chromosome 17.', dbxrefs=[])

with open('tp53.gff', 'w') as file: GFF.write(seqTP53, file)

ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid

The error message is: ('EOF in multi-line statement', (8, 0))

AttributeError Traceback (most recent call last) /home/merc/gitcode/mirna-django/src/scripts/ in () 1 with open('tp53.gff', 'w') as file: ----> 2 GFF.write(seqTP53, file) 3

/usr/local/lib/python2.7/dist-packages/bcbio-0.1-py2.7.egg/BCBio/GFF/GFFOutput.pyc in write(recs, out_handle, include_fasta) 183 """ 184 writer = GFF3Writer() --> 185 return writer.write(recs, out_handle, include_fasta)

/usr/local/lib/python2.7/dist-packages/bcbio-0.1-py2.7.egg/BCBio/GFF/GFFOutput.pyc in write(self, recs, out_handle, include_fasta) 74 fasta_recs = [] 75 for rec in recs: ---> 76 self._write_rec(rec, out_handle) 77 self._write_annotations(rec.annotations, rec.id, out_handle) 78 for sf in rec.features:

/usr/local/lib/python2.7/dist-packages/bcbio-0.1-py2.7.egg/BCBio/GFF/GFFOutput.pyc in _write_rec(self, rec, out_handle) 99 def _write_rec(self, rec, out_handle): 100 # if we have a SeqRecord, write out optional directive

--> 101 if len(rec.seq) > 0: 102 out_handle.write("##sequence-region %s 1 %s\n" % (rec.id, len(rec.seq))) 103

AttributeError: 'str' object has no attribute 'seq'

mercutio22 commented 12 years ago

I just realized GFF.write expects a <generator object parse at 0x2c70d70>. Would you please make it also accept SeqRecord objects?

It would be useful when fetching and parsing with SeqIO.read. But maybe I shouldn't be doing that in the first place.

DarwinAwardWinner commented 12 years ago
class GFF3Writer:

    ...

    def write(self, recs, out_handle, include_fasta=False):
        """Write the provided records to the given handle in GFF3 format.
"""
        id_handler = _IdHandler()
        self._write_header(out_handle)
        fasta_recs = []
        # New code starts here
        try:
            recs = iter(recs)
        except TypeError:
            # A non-iterable is a single record, so put it in a list
            recs = [ recs ]
        # New code ends here
        for rec in recs:
            self._write_rec(rec, out_handle)
            self._write_annotations(rec.annotations, rec.id, out_handle)
            for sf in rec.features:
                sf = self._clean_feature(sf)
                id_handler = self._write_feature(sf, rec.id, out_handle,
                        id_handler)
            if include_fasta and len(rec.seq) > 0:
                fasta_recs.append(rec)
        if len(fasta_recs) > 0:
            self._write_fasta(fasta_recs, out_handle)
chapmanb commented 12 years ago

Hugo and Ryan; Thanks for reporting the problem and for the fix. I checked this in:

https://github.com/chapmanb/bcbb/commit/5352c68d74f2379981e53865678af372ac5aa777

so if you pull from git it should be working as expected. Thanks again.

mercutio22 commented 12 years ago

Thanks a lot. Really appreciate it.