Edinburgh-Genome-Foundry / DnaCauldron

:alembic: Simple cloning simulator (Golden Gate etc.) for single and combinatorial assemblies
https://edinburgh-genome-foundry.github.io/DnaCauldron/
MIT License
50 stars 11 forks source link

annotations crossing overhangs lost #8

Open aaroncooper opened 3 years ago

aaroncooper commented 3 years ago

Just started playing around with DnaCauldron-- super cool module! One thing that isn't a showstopper but would be nice if resolved is that annotations crossing overhangs seem to be lost. it'd be nice if all of those were kept. is this a possibility? I looked for an option in the source but didn't see anything.

veghp commented 3 years ago

Hi, thanks for the interest in DnaCauldron! Yes, this is not an option at the moment. One could argue that an annotation spanning a site&overhang gets "destroyed" during restriction and is not valid anymore, but I see why this feature would be useful. A possible solution would be a script that preprocesses the seq records:

Zulko commented 3 years ago

Destroying a feature when is it getting cut is actually a "feature" of Biopython, but DnaCauldron should fix this via its crop_record_with_saddling_features method, which was specifically written to conserve overhang-crossing features and is used in both StickyEndFragment.list_from_record_digestion and HomologousFragment. From memory it used to work well, but I never wrote a unit test for list_from_record_digestion so we don't even know if it is currently broken, my bad :grimacing: . @aaroncooper do you have any minimal example you could provide?

One could argue that an annotation spanning a site&overhang gets "destroyed" during restriction and is not valid anymore

That's true and probably a good call from Biopython, but in most assemblies the overhangs will be flanking the part in the final construct, so if the feature was just limited to the part and its overhangs it will appear identical in the assembly record.

For all the assembly constructs that you've already made and which have "dropped" part features, there is actually an a-posteriori remediation via the copy_features_between_common_block from the Geneblocks library. This feature was written exactly for the purpose of adding back part features at the time where DnaCauldron was drop cross-overhang part features (which it shouldn't be doing anymore, grmbl grmbl):. Just copying the README example:

from geneblocks import CommonBlocks, load_record
part = load_record('part.gb', name='insert')
plasmid = load_record('part.gb', name='plasmid')
blocks = CommonBlocks.from_sequences([part, plasmid])
new_records = blocks.copy_features_between_common_blocks(inplace=False)
annotated_plasmid = new_records['plasmid'] # Biopython record
aaroncooper commented 3 years ago

Thanks for the quick response. I just walked through the code myself to see what you were describing-- it makes sense to me. I made a small example that shows what I'm seeing.

test_dnacauldron.zip

veghp commented 3 years ago

Thanks for the clarification and correction. I'll have a look into this method and test it.

veghp commented 3 years ago

I tested the fragments with CUBA and can confirm that annotations that overlap with an overhang are lost. This was tested using added annotations that span or partially overlap with the overhang from either sides. I will have a look into the code to find the problem.

AubinF commented 2 years ago

Hi, super cool package indeed, although I just noticed the same issue as Aaron with version 2.0.6. The workaround provided by Zulko works perfectly though 👌 Any ETA for the fix @veghp ? Thanks!

veghp commented 2 years ago

Thanks for the feedback & the patience, I'm currently updating it to work on Python 3.9, but the latest biopython causes an issue with BioBrickStandardAssembly (there is an extra B' in the sticky sequence, possibly due to how the Seq class stores the data now). Once that's fixed, I can look into this