Edinburgh-Genome-Foundry / DnaCauldron

:alembic: Simple cloning simulator (Golden Gate etc.) for single and combinatorial assemblies
https://edinburgh-genome-foundry.github.io/DnaCauldron/
MIT License
50 stars 11 forks source link

Palindromes in Golden Gate assembly, only want one construct #22

Open stephenturner opened 3 months ago

stephenturner commented 3 months ago

Hello. I have a palindrome overhang in one of my parts. This is causing two constructs to get produced. I have to use this palindrome for my application.

This gives me two assemblies: a correct one with the vector and all the parts, and an incorrect one with the vector placed twice with several parts repeated. I tried creating a simple assembly plan, but I still get the same result.

I can set max_constructs=1, and at least for this example I get the shorter/simpler assembly that I want. But I'm not sure I can guarantee this is how it will behave all the time. Or does it?

Thanks for any help you can provide!

image

Zulko commented 3 months ago

Yeah DNACauldron was made to catch quirks such as palyndromic overhangs and be very vocal and stubborn about them.

Just thinking out loud, but I can't be certain that max_constructs=1 will return the valid construct every time (although it could be possible - might depend on how networkx lists cycles in a graph).

I think your best shot (most explicit and robust) would be to simply inspect the constructs returned:

def pick_the_one_valid_record(records, expected_parts_list):
     for record in records:
           parts_list = [f.qualifiers["source"] for f in record.features if "source" in f.qualifiers]
           if parts_list == expected_parts_list:
               return record

records = # ... compute the records based on your assembly   
valid_record = pick_the_one_valid_record(records, expected_parts_list)

There might also be a way to do this through the DnaCauldron API (using mix.compute_circular_assemblies with fragments_set_filter=) but not sure you want to go there (and it won't work with assembly plans)/

stephenturner commented 3 months ago

Many thanks for the quick response! This gets the records I want. Excuse the naive question here, but now that I've filtered down to valid records, how can I filter down the simulation object I use to write the report? Following the workflow in the docs:

simulation=assembly.simulate(sequence_repository=repository)

# valid record picking here
# something else here

# show stats
simulation.compute_summary_dataframe()

# Write output
report_writer = dc.AssemblyReportWriter(
    include_fragment_plots='on_error',
    include_assembly_plots=True,
    include_mix_graphs=True, 
    include_part_plots=False,
    include_pdf_report=True
)
simulation.write_report(
    target="output-group1",
    report_writer=report_writer,
)
stephenturner commented 3 months ago

Nevermind. I think I answered my own question. Simply replacing the construct_records with a single element list of valid records does the trick.

simulation.construct_records=[valid_record]
veghp commented 3 months ago

Thanks for posting the code. What do you have in mind for valid record picking? One option I can think of is filtering by expected length (maybe using the sizes of the valid fragments, i.e. the ones with 2 overhangs and no enzyme sites), the other option is checking for the presence of exactly one "From X" etc feature annotation from each part.

stephenturner commented 3 months ago

In my case, I'm performing many simple GG assemblies with a known number of parts and a vector backbone. I've designed overhangs such that the only valid record is the one that includes all the parts. That code above solves my issue. Thanks again!