Closed alexsongrj closed 3 years ago
Below is a script I wrote where 100 random text fixtures are generated and then tested. It works so long as half_homology = 2
. If we let it evaluate to 3 // 2 == 1
per the code in the master branch we cannot get the output to pass design verification by dnacauldron. @Zulko and @veghp, can you see if this might make sense? Or if it would be bad for other designs to hardcode this to 2?
import random
import dnaweaver as dw
from dnaweaver import TmSegmentSelector
import dnacauldron as dc
import pandas as pd
import os
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from geneblocks import sequences_are_circularly_equal
def DNA(length=int(), letters="CGTA"):
return''.join(random.choices(letters, k=length))
NUM_TRIALS = 100
for i in range(0, NUM_TRIALS):
sequences = {
'part1': DNA(1000),
'part2': DNA(1000),
'part3': DNA(1000),
}
desired_sequence = sequences['part1'] + sequences['part2'] + sequences['part3']
# make sure desired has no SapI site.
desired_sequence = desired_sequence.replace('GCTCTTC', 'GCGGAGC').replace('GAAGAGC', 'GATTTTC')
desired_sequence = dw.SequenceString(desired_sequence, metadata={"topology": 'circular'})
oligos_company = dw.CommercialDnaOffer(
"OligoCompany",
sequence_constraints=[dw.SequenceLengthConstraint(max_length=200)],
pricing=dw.PerBasepairPricing(0.1)
)
pcr_station = dw.PcrExtractionStation(
name="PCR station",
max_overhang_length=50,
primers_supplier=oligos_company,
sequences=sequences,
homology_selector=TmSegmentSelector(min_tm=35, max_tm=70),
extra_cost=5
)
assembly_station = dw.DnaAssemblyStation(
name="Golden Gate assembly",
assembly_method = dw.GoldenGateAssemblyMethod(
enzyme='SapI',
min_segment_length=100,
max_segment_length=10000,
wildcard_basepair='G',
left_addition='TCCTAG',
right_addition='CTAGGA',
),
supplier=pcr_station,
coarse_grain=5,
fine_grain=None,
logger='bar'
)
# THIS LINE WILL PRE-BLAST THE SEQUENCE TO ACCELERATE COMPUTATIONS.
assembly_station.prepare_network_on_sequence(desired_sequence)
# FIND AN ASSEMBLY PLAN AND PRINT IT.
quote = assembly_station.get_quote(desired_sequence)
print(quote)
assembly_report = quote.to_assembly_plan_report()
try:
assembly_report.write_full_report(f"Design{i}")
print("Success! See figure assembly_plans.png.")
except Exception as e:
print("Error encountered....")
print(e)
raise
# now verify things are correct
df = pd.read_csv(f'Design{i}/sequences.csv', sep=';')
target_sequence = df[df.Source=="Golden Gate assembly"].Sequence.unique()[0]
df = df[df.Source=='PCR station']
parts = {}
ids = []
for _, row in df.iterrows():
new_seq_record = SeqRecord(Seq(row.Sequence))
new_seq_record.id = row.ID
new_seq_record.description = row.ID
new_seq_record.name = row.ID
new_seq_record.annotations['topology'] = 'linear'
parts[row.ID] = new_seq_record
ids.append(row.ID)
repository = dc.SequenceRepository(collections={'my_collection': parts})
assembly = dc.Type2sRestrictionAssembly(parts=ids, enzyme='SapI')
simulation = assembly.simulate(sequence_repository=repository)
# Print the ID and length of the generated construct(s)
for record in simulation.construct_records:
print (record.id, len(record))
# Get a list of dictionnaries with data on each construct
constructs_data = simulation.compute_all_construct_data_dicts()
report_writer = dc.AssemblyReportWriter(
include_mix_graphs=True, include_part_plots=True
)
# Write a full report with sequences and figures in a zip.
simulation.write_report(f"Design{i}", report_writer=report_writer)
assert len(simulation.construct_records) == 1
predicted_construct_record = simulation.construct_records[0]
assert sequences_are_circularly_equal([predicted_construct_record, target_sequence])
print("VALID!")
Thanks both! I'm not sure exactly what the underlying problem was but happy that your changes work. The problem with the hard-coded 2
is that it won't work for longer homologies (Gibson Assembly, yeast recombination etc.). This makes test_circular_sequences.py
fail. Would it be possible to replace these lines with the following, which should still work for you, and makes the current tests pass?
half_homology = int(numpy.ceil(self.max_homology_size / 2))
Would it be possible to create a minimal test (e.g. based on the script above, but without the trials loop) to add to the test suite?
@Zulko, sounds good. @alexsongrj and I will make this change, add a minimal test, and ensure all other tests pass.
@Zulko we made a minimal test in test_spai_enzyme_restriction_site.py with a random sequence plasmid. Would you mind take a look? thanks:)
@veghp - good to go?
This looks good to me. @veghp are you ok to merge and release?
Missed the last question, but now merged and released. Also fixed a typo in the test file name. Thanks for the contribution!
DNAweaver is a cool library to perform golden gate assembly. We would like to introduce "SapI" restriction enzyme to Golden Gate, change code in below. it is working for our sequence. Might need a more random sequence to test this. welcome to further discuss one of the changes.