Edinburgh-Genome-Foundry / DnaWeaver

A route planner for DNA assembly
https://dnaweaver.genomefoundry.org
MIT License
29 stars 9 forks source link

verify_constraints performance #3

Open andrewshvv opened 3 years ago

andrewshvv commented 3 years ago

I have noticed that the verify_constraints is the most called function. It takes about 50% of execution time, which slows down the program and development cycle for real use cases (10kb sequences).

What is the proper way to implement the caching mechanism?

Zulko commented 3 years ago

There are several ways to up speed computations, but that depends on your project (do you have a minimal example you could share?).

The verify_constraints method is probably called on every subsequence so it is not surprising for it to be a bottleneck. The surest way to speed up computations is to use the coarse_grain/fine_grain parameters, or the A* parameter (both approaches reduce the number of subsequences evaluated, and correspond to slides 220-250 of this slideshow).

Another optimization consists in using memoize=True in CommercialDnaOffer and DnaAssemblyStation to cache the computations of price (and assembly plans) for subsequences. but that will only speed things up if you are doing multi-level assemblies, or if you are evaluating several sequences (with common subsequences) on a same supply network.

I understand that the documentation is not as good as it could be so don't hesitate if you have more questions.

andrewshvv commented 3 years ago

Thank you @Zulko! I will try what you mentioned

andrewshvv commented 3 years ago

To be honest, code comments helping a lot, so you are a bit humble on saying that documentation is not sufficient :)