BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
166 stars 45 forks source link

cannot parse gb files using pydna.parsers.parse #82

Closed AubinF closed 2 years ago

AubinF commented 2 years ago

Hi, great package, thanks ! :) I use it quite a lot, but recently I've had issues with pydna.parsers.parse crashing while loading gb files :

from pydna.parsers import parse
parse("/tmp/dna2949.gb",ds=True)

TypeError Traceback (most recent call last) /tmp/ipykernel_62410/3705275047.py in ----> 1 parse("/tmp/dna2949.gb",ds=True)

~/Software/anaconda3/envs/dna_design/lib/python3.9/site-packages/pydna/parsers.py in parse(data, ds) 141 path = item 142 finally: --> 143 sequences.extend(embl_gb_fasta(raw, ds, path)) 144 return sequences 145

~/Software/anaconda3/envs/dna_design/lib/python3.9/site-packages/pydna/parsers.py in embl_gb_fasta(raw, ds, path) 105 if ds and path: 106 result_list.append( --> 107 _GenbankFile.from_SeqRecord( 108 parsed, linear=not circular, circular=circular, path=path 109 )

~/Software/anaconda3/envs/dna_design/lib/python3.9/site-packages/pydna/genbankfile.py in from_SeqRecord(cls, record, path, *args, kwargs) 16 @classmethod 17 def from_SeqRecord(cls, record, *args, path=None, *kwargs): ---> 18 obj = super().from_SeqRecord(record, args, kwargs) 19 obj.path = path 20 return obj

~/Software/anaconda3/envs/dna_design/lib/python3.9/site-packages/pydna/dseqrecord.py in from_SeqRecord(cls, record, linear, circular, n, *args, **kwargs) 229 ): 230 obj = cls.new(cls) # Does not call init --> 231 obj._seq = _Dseq.quick( 232 str(record.seq), 233 _rc(str(record.seq)),

~/Software/anaconda3/envs/dna_design/lib/python3.9/site-packages/pydna/dseq.py in quick(cls, watson, crick, ovhg, linear, circular, pos) 398 cb = bytes(crick, encoding="ASCII") 399 obj._data = ( --> 400 _rc(cb[-max(0, ovhg) or len(cb):]) 401 + wb 402 + _rc(cb[: max(0, len(cb) - ovhg - len(wb))]))

~/Software/anaconda3/envs/dna_design/lib/python3.9/site-packages/pydna/utils.py in rc(sequence) 52 accepts mixed DNA/RNA 53 """ ---> 54 return sequence.translate(_complement_table)[::-1] 55 56

TypeError: a bytes-like object is required, not 'dict'

I thought perhaps removing translations from the file might help (I do not really need those) but I get the same error. Would you be able to have a look at this ?

I attached a gb file I am trying to open. In fact all gb files I tried failed. Let me know if I can help

Thanks dna2943.zip

BjornFJohansson commented 2 years ago

Hi, Thanks for sharing this. Ill look in to this asap.

BjornFJohansson commented 2 years ago

@AubinF BTW Do you have biopython 1.79?

BjornFJohansson commented 2 years ago

x = parse("dna2943.gb",ds=True)

x
Out[5]: [File(dna2943._Coding_seq_of_)(-1495)]

x[0]
Out[6]: File(dna2943._Coding_seq_of_)(-1495)

x[0].seq
Out[7]: 
Dseq(-1495)
ATCG..ATTC
TAGC..TAAG

x[0].seq
BjornFJohansson commented 2 years ago

I added this example to the test suite for the 5.0.0 version. Please reopen if you need moe help.