Open stephenturner opened 3 months ago
Yeah DNACauldron was made to catch quirks such as palyndromic overhangs and be very vocal and stubborn about them.
Just thinking out loud, but I can't be certain that max_constructs=1
will return the valid construct every time (although it could be possible - might depend on how networkx lists cycles in a graph).
I think your best shot (most explicit and robust) would be to simply inspect the constructs returned:
def pick_the_one_valid_record(records, expected_parts_list):
for record in records:
parts_list = [f.qualifiers["source"] for f in record.features if "source" in f.qualifiers]
if parts_list == expected_parts_list:
return record
records = # ... compute the records based on your assembly
valid_record = pick_the_one_valid_record(records, expected_parts_list)
There might also be a way to do this through the DnaCauldron API (using mix.compute_circular_assemblies
with fragments_set_filter=
) but not sure you want to go there (and it won't work with assembly plans)/
Many thanks for the quick response! This gets the records I want. Excuse the naive question here, but now that I've filtered down to valid records, how can I filter down the simulation object I use to write the report? Following the workflow in the docs:
simulation=assembly.simulate(sequence_repository=repository)
# valid record picking here
# something else here
# show stats
simulation.compute_summary_dataframe()
# Write output
report_writer = dc.AssemblyReportWriter(
include_fragment_plots='on_error',
include_assembly_plots=True,
include_mix_graphs=True,
include_part_plots=False,
include_pdf_report=True
)
simulation.write_report(
target="output-group1",
report_writer=report_writer,
)
Nevermind. I think I answered my own question. Simply replacing the construct_records with a single element list of valid records does the trick.
simulation.construct_records=[valid_record]
Thanks for posting the code. What do you have in mind for valid record picking? One option I can think of is filtering by expected length (maybe using the sizes of the valid fragments, i.e. the ones with 2 overhangs and no enzyme sites), the other option is checking for the presence of exactly one "From X" etc feature annotation from each part.
In my case, I'm performing many simple GG assemblies with a known number of parts and a vector backbone. I've designed overhangs such that the only valid record is the one that includes all the parts. That code above solves my issue. Thanks again!
Hello. I have a palindrome overhang in one of my parts. This is causing two constructs to get produced. I have to use this palindrome for my application.
This gives me two assemblies: a correct one with the vector and all the parts, and an incorrect one with the vector placed twice with several parts repeated. I tried creating a simple assembly plan, but I still get the same result.
I can set max_constructs=1, and at least for this example I get the shorter/simpler assembly that I want. But I'm not sure I can guarantee this is how it will behave all the time. Or does it?
Thanks for any help you can provide!