Primers designed by cloning_primers can exceed maxlength

hsiaut commented 7 years ago

Hello, First of all, great module! One issue I have noticed is that pydna.cloning_primers() can generate primers that exceed maxlength.

My understanding of maxlength is that if I set it to, eg, 40, then the module should raise an Exception if no primer can be designed with a total primer length (including tail) of less than 40.

However, in my tests, and looking at the code, it seems like cloning_primers() does not check to see if the primers are indeed < maxlength. I can set maxlength to 10 and will get primers.

Is this the intended behavior and users are advised to check the designed primers themselves to see if they do not exceed some limit?

Thanks, Tim Hsiau

BjornFJohansson commented 7 years ago

Hi, and thanks for your interest in pydna. I have been working on a simplification of the cloning_primers interface that will also play a bit better with the assembly primers code. Both interfaces should be simplified so that they are more intuitive.

You are quite right that maxlength is not used to make sure that the primers are below a certain length. An idea would be to remove the maxlength argument altogether. Sometimes we want to keep the primers below a certain length due to cost for example, it would be easier and more readable to post process the primers in a loop. The Primer class is a subclass of th Biopython SeqRecord class, so it supports slice notation:

import pydna

a=pydna.Primer("aaaaat")

a
Out[3]: <unknown id> 6-mer:5'-aaaaat-3'

b=a[:4]

b
Out[5]: <unknown id> 4-mer:5'-aaaa-3'

type(a), type(b)
Out[6]: (pydna.primer.Primer, pydna.primer.Primer)

Let me know what you would like from this interface. So far I have not had a lot of feedback on this functionality, but I would like it to be more general.

hsiaut commented 7 years ago

Hey Bjorn, I agree that post-processing the primers in a loop would be acceptable. Another thought is to make the maxlength argument optional. So default maxlength=None. If it is supplied in the arguments, then the code would either return the primers or None if it cannot design according to the parameters primers. That might be a bit too complicated though.

In terms of the output from cloning_primers and assembly_primers:

I don't think they set the primer.footprint and primer.tail , those would be useful information to keep track of what anneals versus what is the primer tail.
In terms of designing primers, it may be good to have another method that is like primer3 in that the amplicon needs to include a target_region, but the primers themselves can be positioned anywhere in allowed regions. This method would then return a list of primer pairs. Eg, for verifying integrations, often one doesn't care much about the actual start and end positions of the verification amplicon as long as it is designed to amplify the correct junction. Currently, I just do a loop and iterate over the target sequence with a specified target size, but there may be more efficient ways of finding acceptable primer pairs.
I agree with issue-26 in that returning amplicon objects may be more intuitive. Also, it would be nice to keep track of the region of the template that is copied by the amplicon. In my own code, I have tried to do something like what you suggested in issue 26 and generated amplicon objects from the resulting primer pairs using pcr(). However, in my limited testing, it seems to result in a much longer runtime. Perhaps using pcr() was the wrong approach and I could have directly initialized the amplicon objects as I know the start and end positions of the amplicons and can skip the sequence search cost of Anneal.

Otherwise, the functions are quite convenient and reduce a lot of boilerplate code for me, thanks!

BjornFJohansson commented 7 years ago

have a look at issue #26. I will make a new pre release really soon. Ill let you know when it is ready.

the interior of the Primer class is different in the upcoming pre release. Hope this will be fixed in combination with the new functions (#26).
I think biopython has a wrapper for primer3? This would be the way to go in this case, since primer3 is tried and true.
I know the assembly step can choke if there are many paths through the graph. Or is it in the pcr step ?

ps if you can give a code sample I could perhaps help out more.

hsiaut commented 7 years ago

Thanks, I agree with points 1 and 2.

It was in the PCR step. I believe the pydna.amplify.pcr function searches for the annealing region within the entire template. However, in the case of primers just designed by primer_design, we already know where the annealing region is -- at the very ends of the desired amplicon. So the searching step could be avoided.

I was generating 100,000s of potential amplicons in silico and that led to a noticeable slowdown. Probably for most cases it wouldn't matter.

Thanks!

BjornFJohansson commented 7 years ago

The pydna.amplify.pcr function will try to return only one PCR product, in fact it will raise an exception if there are more than one. However, the function relies on the pydna.amplify.Anneal class that tires to return all possible products from a set of primers and one template. It might choke if many primers anneal on many different places within the template (large template, short primers?) OR the primer argument list contains many primers. I have never tried to make this algorithm robust in this respect, since most of the time I want one specific product.

hsiaut commented 7 years ago

Sorry, it wasn't clear. I was sliding a window with a step size of 5bp or so around on a genome and generating an amplicon. So the amplicons are different and unique, but in total there were 100,000s of amplicons that I was trying to compare.

BjornFJohansson commented 7 years ago

Please have a look at the new pydna.design.primer_design module. It is simplified and should work as advertised.

BjornFJohansson commented 7 years ago

Will close this for now, feel free to reopen

hsiaut commented 7 years ago

Thanks, looks good.

BjornFJohansson / pydna

Primers designed by cloning_primers can exceed maxlength #34