BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
159 stars 40 forks source link

Corrupted file or something? #57

Closed willsharpless closed 4 years ago

willsharpless commented 4 years ago

Hi Bjorn,

First off, thank you: I love Pydna and I really appreciate that you respond to my issues and make updates!

Recently, I had a new issue which I have not had before, and despite trying to trouble shoot the issue I have no idea why my code is breaking: So I was assembling some plasmids, with a workflow that I have used previously to make 20 or so plasmids and for some reason after exporting the Dseqobjects as gb files, I noticed in SnapGene that the last plasmid (and ONLY the last plasmid) had been cut about ~2kb upstream of the assembly location. What weirdest is that when I call list_features() before and after exporting, the plasmid in python looks totally fine, but if I reimport the plasmid file and parse it with SeqIO from biopython, it also gives me the truncated version. In both snapgene and the genbank file, the truncated version maintains the features that were in the missing region but the length of each is now 0? To be explicit, the assembly works fine and this was actually within an iterative design where I inserted 8 different fragments into the same backbone and only the last one had this issue all the others look fine. I understand its possible this error is related to BioPython, I thought you might have some valuable input; it is so strange to me that the flow works for all the other files w the same backbone and I have assembled and exported this same exact way. It is worth mentioning, that I have not actually updated the new alpha version of pydna you just released so it is not something in the update (and I have not changed biopython recently). Pictures and more details below:

Assembly workflow:

  1. plasmid.cut() backbone with 1 enzyme 2.fill_in() edges 3.Assembly(limit = 14) #default algorithm, which I think is common substrings 4.Assemble_circular

Assembly output (copy and pasted):

7 other plasmids and then the following one, Note the length 8453 bp

Dseqrecord circular: True size: 8453 ID: plasmid Number of features: 36 Dseq(o8453) GCGC..CGCC CGCG..GCGG

exporting genbank files, this is the way I always do it

for x, y in doms.items(): z = str(y.id) output_file = open(z + '.gb', 'w') SeqIO.write(y, output_file, "gb")

print(plasmid) output (all normal): Dseqrecord circular: True size: 8453 ID: plasmid Number of features: 37 Dseq(o8453) GCGC..CGCC CGCG..GCGG

importing the file I just exported

`with open("plasmid.gb") as handle: plasmid_check = SeqIO.read(handle, "gb")

print(plasmid_check.format("gb"))

plasmid_check_copy = SeqRecord(str(plasmid_check.seq)) plasmid_check_copy.dict.update(plasmid_check.dict) Note now that the length is 6820 bp????? and when: plasmid_check_copy.list_features()`

output error: image

BjornFJohansson commented 4 years ago

Hi, I am traveling right now, but Ill get back to you ASAP. Regarding one assembly that does not work as expected, do inspect all results from the Assembly, Sometimes there are other regions of homology that recombine in unexpected ways.

willsharpless commented 4 years ago

Gotcha, no rush on the reply, enjoy your travels. I was wondering the same but do you think the plasmid would recombine in-silico only upon exporting the file? The assembly itself says that I have the full 8 kb plasmid (with all correct-length features as expected).

BjornFJohansson commented 4 years ago

Hi, I created an example that replicated your error. Seems that the file needs to be closed explicitly for the file to be fully written. Alternatively you can use the "with open" syntax. Try this and see if it helps.

for x, y in doms.items(): 
    z = str(y.id) 
    output_file = open(z + '.gb', 'w') 
    SeqIO.write(y, output_file, "gb")
    output_file.close()        # Close file!
willsharpless commented 4 years ago

How interesting, especially because I have generated multiple at once like this before and not seen the error! I appreciate your hard work.

Bjorn Johansson is a great man! I look forward to seeing the development of PyDNA.