Closed jcollins6 closed 1 year ago
Hey Joe,
I'm interested in the idea, and I can see how it'd be nice to go (primer, primer) -> score
.
Do you mind expanding a bit on the use case? Are you going to generate hundreds of pairs externally and then score each via primers
?
Hey Josh,
That's right, it would be great if primers could score PCRs designed manually by scientists. Right now, we have loose guidelines when designing PCRs, and want to quantify a "good" or "bad" design.
Primer3 does this as well, but there is maximum oligo length (35 bp) and it doesn't go around a plasmid origin.
Thanks, Joe
This would also help us understand our historical PCR data, and potentially change the penalty score settings based on our actual lab data
Ok I'm thinking about adding something like below, based on the existing scoring function: https://github.com/Lattice-Automation/primers#scoring
-s corresponds to the SEQ
that the primers are meant to amplify, -t is the entire parent sequence to check for off-targets
primers score ACGATCGATCAGCTAG CCCGGCTTTACAACTA -s ACGATCGATCAGCTAGTAGTTGTAAAGCCGGG -t GACAGCATCACATCTACGATCAGCTACGAGC
[{
"tm": 56.0,
"tmTotal": 65.0,
"tmDiff": 3.0,
"dg": -6.0,
"len": 16,
"offtargets" 0,
"penalty": 5.6
},
{
...
}]
I think that may be more useful that just the score because you'll be able to take the tm, tmDiff, dg, len, and offtargets and do some modelling to correlate each with the success of PCR?
Ya having all of those is helpful, thanks! One thing that I would change is to make the sequence that is meant to be amplified (-s) optional. Is that possible?
Joe
Hey @jcollins6, I added a sub-command in the direction of what we'd talked about in a v0.5.0:
https://github.com/Lattice-Automation/primers#scoring-existing-primers
I hope you'll check it out and let me know what you think.
One thing that I would change is to make the sequence that is meant to be amplified (-s) optional
That had been my plan. I wound up not even supporting that for a first-pass as this feature. The reason is I'll have to search the sequence to find where it was (supposed) to bind first. And then use that to calculate tm
vs tm_total
. I still think I should do that, but didn't as a first-pass. -t
(sequence to check for off-targets) is supported
primers score --json GGTCTCAATGAGACAATAGCACACAC GAAGACTTTCGTATGCTGACCTAG | jq
[
{
"seq": "GGTCTCAATGAGACAATAGCACACAC",
"len": 26,
"tm": 67,
"tm_total": 67,
"gc": 0.5,
"dg": -1.86,
"fwd": true,
"off_target_count": 0,
"scoring": {
"penalty": 13.23,
"penalty_tm": 5,
"penalty_tm_diff": 2.5,
"penalty_gc": 0,
"penalty_len": 2,
"penalty_dg": 3.73,
"penalty_off_target": 0
}
},
{
"seq": "GAAGACTTTCGTATGCTGACCTAG",
"len": 24,
"tm": 64.5,
"tm_total": 64.5,
"gc": 0.5,
"dg": 0,
"fwd": false,
"off_target_count": 0,
"scoring": {
"penalty": 6,
"penalty_tm": 2.5,
"penalty_tm_diff": 2.5,
"penalty_gc": 0,
"penalty_len": 1,
"penalty_dg": 0,
"penalty_off_target": 0
}
}
]
As an aside, my hope would be that you might publish or share some of the data from your analysis. It would help to get experimental support (or otherwise) for these parameters: https://github.com/Lattice-Automation/primers#scoring
IIRC they came from looking at other PCR tools. But lab data would of course be preferable.
Hi Josh, this looks great so far thanks! One thing I am noticing is that the Tm doesn't seem right for primers where I am putting on extra bps (like for Gibson/GG) on the end of the annealing sequence. I checked some primers in Benchling, and I would guess you are calculating the Tm based only on the primer, not how it is annealing to the template. Is that right?
Best, Joe
@jcollins6 that's correct, the first pass isn't smart enough to figure out the tm
vs tm_total
. That's what I was (obliquely) referring to below:
That had been my plan. I wound up not even supporting that for a first-pass as this feature. The reason is I'll have to search the sequence to find where it was (supposed) to bind first. And then use that to calculate tm vs tm_total. I still think I should do that, but didn't as a first-pass. -t (sequence to check for off-targets) is supported
It's doable, and I'll let you know after I add it, I just didn't implement in the first pass
@jcollins6 primers score
now returns a different tm
and tm_total
based on added sequence: https://github.com/Lattice-Automation/primers/releases/tag/v0.5.3
The approach is:
tm_total
but not tm
hey Josh, thanks for the quick update! This looks pretty good so far. A couple things I've noticed so far:
Best, Joe
I get an error if the PCR goes around the origin of a plasmid. Runs successfully once I re-index the plasmid.
Do you mind sharing a minimal test case so I can reproduce? I'm not surprised it errors out (there's little handling of plasmid zero-indexes right now), but I can use it as a regression test with a patch.
The penalty scores seem very high for primers with extra bp added at the end (for Gibson/GG). I will try and test different float values.
Side note: tm_penalty
is only for the annealing portion (right now): https://github.com/Lattice-Automation/primers/blob/master/primers/primers.py#L143. So if the primers were designed with the total tm in mind, primers
will still penalize them heavily for a small tm for the annealing portion. The README has an example of that: https://github.com/Lattice-Automation/primers#scoring-existing-primers
I don't know that that's why the primers are scored poorly, but it's my guess. Please let me know if there are other penalty factors that seem unexpected.
here is an example using the pUC19 plasmid:
% primers score cccgtagaaaagatcaaaggatcttc gtaaaaaggccgcgttgctg -s gagatacctacagcgtgagctatgagaaagcgccacgcttcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcctggccttttgctggccttttgctcacatgttctttcctgcgttatcccctgattctgtggataaccgtattaccgcctttgagtgagctgataccgctcgccgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaagagcgcccaatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacacaggaaacagctatgaccatgattacgccaagcttgcatgcctgcaggtcgactctagaggatccccgggtaccgagctcgaattcactggccgtcgttttacaacgtcgtgactgggaaaaccctggcgttacccaacttaatcgccttgcagcacatccccctttcgccagctggcgtaatagcgaagaggcccgcaccgatcgcccttcccaacagttgcgcagcctgaatggcgaatggcgcctgatgcggtattttctccttacgcatctgtgcggtatttcacaccgcatatggtgcactctcagtacaatctgctctgatgccgcatagttaagccagccccgacacccgccaacacccgctgacgcgccctgacgggcttgtctgctcccggcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtcagaggttttcaccgtcatcaccgaaacgcgcgagacgaaagggcctcgtgatacgcctatttttataggttaatgtcatgataataatggtttcttagacgtcaggtggcacttttcggggaaatgtgcgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcctgtttttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacgagtgggttacatcgaactggatctcaacagcggtaagatccttgagagttttcgccccgaagaacgttttccaatgatgagcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatgacttggttgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaagagaattatgcagtgctgccataaccatgagtgataacactgcggccaacttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgatcgttgggaaccggagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaatggcaacaacgttgcgcaaactattaactggcgaactacttactctagcttcccggcaacaattaatagactggatggaggcggataaagttgcaggaccacttctgcgctcggcccttccggctggctggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcattgcagcactggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcaggcaactatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcattggtaactgtcagaccaagtttactcatatatactttagattgatttaaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgataatctcatgaccaaaatcccttaacgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcggtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagcagagcgcagataccaaatactgttcttctagtgtagccgtagttaggccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcgcagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaact Traceback (most recent call last): File "/opt/anaconda3/lib/python3.9/site-packages/primers/primers.py", line 329, in _binding_seq end_index_seq = seq.index(rev[:end_index_rev]) ValueError: substring not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/anaconda3/bin/primers", line 8, in rev
binding site in seq
: %s", err)
ValueError: ('failed to find rev
binding site in seq
: %s', ValueError('substring not found'))
Hey @jcollins6 I'm going to try to get this tonight (better handling of zero-index). Has been busy at work (HCP) lately
I get an error if the PCR goes around the origin of a plasmid. Runs successfully once I re-index the plasmid.
@jcollins6 Just pushed another patch in 0.5.4 that addresses this, unit test here: https://github.com/Lattice-Automation/primers/blob/master/tests/primers_test.py#L182
$ primers score cccgtagaaaagatcaaaggatcttc gtaaaaaggccgcgttgctg -s gagatacctacagcgtgagctatgagaaagcgccacgcttcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcctggccttttgctggccttttgctcacatgttctttcctgcgttatcccctgattctgtggataaccgtattaccgcctttgagtgagctgataccgctcgccgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaagagcgcccaatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacacaggaaacagctatgaccatgattacgccaagcttgcatgcctgcaggtcgactctagaggatccccgggtaccgagctcgaattcactggccgtcgttttacaacgtcgtgactgggaaaaccctggcgttacccaacttaatcgccttgcagcacatccccctttcgccagctggcgtaatagcgaagaggcccgcaccgatcgcccttcccaacagttgcgcagcctgaatggcgaatggcgcctgatgcggtattttctccttacgcatctgtgcggtatttcacaccgcatatggtgcactctcagtacaatctgctctgatgccgcatagttaagccagccccgacacccgccaacacccgctgacgcgccctgacgggcttgtctgctcccggcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtcagaggttttcaccgtcatcaccgaaacgcgcgagacgaaagggcctcgtgatacgcctatttttataggttaatgtcatgataataatggtttcttagacgtcaggtggcacttttcggggaaatgtgcgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcctgtttttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacgagtgggttacatcgaactggatctcaacagcggtaagatccttgagagttttcgccccgaagaacgttttccaatgatgagcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatgacttggttgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaagagaattatgcagtgctgccataaccatgagtgataacactgcggccaacttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgatcgttgggaaccggagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaatggcaacaacgttgcgcaaactattaactggcgaactacttactctagcttcccggcaacaattaatagactggatggaggcggataaagttgcaggaccacttctgcgctcggcccttccggctggctggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcattgcagcactggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcaggcaactatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcattggtaactgtcagaccaagtttactcatatatactttagattgatttaaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgataatctcatgaccaaaatcccttaacgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcggtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagcagagcgcagataccaaatactgttcttctagtgtagccgtagttaggccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcgcagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaact --json | jq
[
{
"seq": "CCCGTAGAAAAGATCAAAGGATCTTC",
"len": 26,
"tm": 64.9,
"tm_total": 64.9,
"gc": 0.5,
"dg": -3.2,
"fwd": true,
"off_target_count": 2,
"scoring": {
"penalty": 53.9,
"penalty_tm": 2.9,
"penalty_tm_diff": 2.6,
"penalty_gc": 0,
"penalty_len": 2,
"penalty_dg": 6.4,
"penalty_off_target": 40
}
},
{
"seq": "GTAAAAAGGCCGCGTTGCTG",
"len": 20,
"tm": 67.5,
"tm_total": 67.5,
"gc": 0.6,
"dg": 0,
"fwd": false,
"off_target_count": 0,
"scoring": {
"penalty": 11.1,
"penalty_tm": 5.5,
"penalty_tm_diff": 2.6,
"penalty_gc": 2,
"penalty_len": 1,
"penalty_dg": 0,
"penalty_off_target": 0
}
}
]
This worked on my end as well, thanks!
It would be great to be able to score a PCR based on the primers that I specify. Something like this:
primers -t {template} -oligo1 {oligo1_seq} -oligo2 {oligo2_seq}
Thanks, Joe