eclarke / swga

Select primer sets for selective whole genome amplification (SWGA)
GNU General Public License v3.0
33 stars 13 forks source link

Sets with overlapping primers #39

Open ClarkLabUCB opened 7 years ago

ClarkLabUCB commented 7 years ago

I am getting sets that have overlapping primers (see below for full output):

For example: CGAATCGTTCTA GCGAATCGTTCT

Is this allowed in swga? or am I setting the parameters wrong? or could it be a bug?

PRIMER SUMMARY

There are 43704 primers in the database.

500 are marked as active (i.e., they passed filter steps and will be used to find sets of compatible primers.)

The average number of foreground genome binding sites is 1. (avg binding / genome_length = 0.000110) The average number of background genome binding sites is 4727. (avg binding / genome_length = 0.000001)

The melting temp of the primers ranges between 49.50C and 64.81C with an average of 58.54C.

SETS SUMMARY

There are 84463 sets in the database. The best scoring set is #7111, with 10 primers and a score of 0.000011. Various statistics:

eclarke commented 7 years ago

Hi Iain,

We check to make sure the primers don't have more than a few complementary bases between them to avoid primer dimers, and check to make sure one primer is not a complete subset of another. The primers you've identified look very similar, but we don't know for sure that they actually land on the same spot in the genome (since their sequences aren't identical or subsets of each other).

In short, it's okay to have overlapping primers, just not subsets or regions of complementarity.

Erik

ClarkLabUCB commented 7 years ago

Hey Erik,

Thanks for your response!

Yeah, I can see that, except that having both those primers is probably not adding anything. I assume that they were chosen because they are so similar and therefore meet the same criteria. They definitely bind the same spot on the fg genome. If this happens a lot, it means that if I choose a set of 10, maybe 6 are actually uniquely targeting the fg genome. I can just expand the desired number in my set to overcome this.However, I don't see an advantage of allowing primers to overlap, although I could be missing something.

A couple other questions, thanks for your time (and for sharing swga)!

  1. Do you have a rule of thumb for picking the annealing temperature for a given reaction temperature? Lets say I want to run a SWGA with Bst enzyme at 60C - would you pick primers 50-65C?
  2. How does swga handle NN's in the fg sequence? Does the spacing information between sequences separated by NN's get preserved?

Best, Iain

On Wed, Mar 8, 2017 at 12:06 PM, Erik Clarke notifications@github.com wrote:

Hi Iain,

We check to make sure the primers don't have more than a few complementary bases between them to avoid primer dimers, and check for one primer being a complete subset of another. The primers you've identified look very similar, but we don't know for sure that they actually land on the same spot in the genome (since their sequences aren't identical or subsets of each other).

In short, it's okay to have overlapping primers, just not subsets or regions of complementarity.

Erik

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eclarke/swga/issues/39#issuecomment-285153480, or mute the thread https://github.com/notifications/unsubscribe-auth/AFNql4Q_HxHtDEaFXh44nF7HkrA_N46Mks5rjwpegaJpZM4MXIn4 .

eclarke commented 7 years ago

While we've thought about excluding primers that are that substantially similar, there will be situations where being off by one or two bases does actually change their binding pattern in the foreground genome. I don't think it'd negatively affect them, and your solution of increasing the maximum set size (or even just excluding the overlapping primers) would be reasonable.

For your other questions:

  1. I would choose an annealing temperature around the operating temperature of your enzyme. We haven't tested the method with enzymes other than phi29 though so I'm not sure how the multiple displacement amplification would work at higher temperatures and with different enzymes.

  2. I believe that Ns are ignored for kmer formation (but I need to test exactly what dsk, the program we use for kmer counting, does). They are preserved when calculating distances.

Best, Erik