Lattice-Automation / primers

A PCR primer tool for DNA assembly flows
MIT License
27 stars 6 forks source link

Standalone tool to evaluate PCR quality #4

Closed jcollins6 closed 1 year ago

jcollins6 commented 1 year ago

It would be great to be able to score a PCR based on the primers that I specify. Something like this:

primers -t {template} -oligo1 {oligo1_seq} -oligo2 {oligo2_seq}

Thanks, Joe

jjti commented 1 year ago

Hey Joe,

I'm interested in the idea, and I can see how it'd be nice to go (primer, primer) -> score.

Do you mind expanding a bit on the use case? Are you going to generate hundreds of pairs externally and then score each via primers?

jcollins6 commented 1 year ago

Hey Josh,

That's right, it would be great if primers could score PCRs designed manually by scientists. Right now, we have loose guidelines when designing PCRs, and want to quantify a "good" or "bad" design.

Primer3 does this as well, but there is maximum oligo length (35 bp) and it doesn't go around a plasmid origin.

Thanks, Joe

jcollins6 commented 1 year ago

This would also help us understand our historical PCR data, and potentially change the penalty score settings based on our actual lab data

jjti commented 1 year ago

Ok I'm thinking about adding something like below, based on the existing scoring function: https://github.com/Lattice-Automation/primers#scoring

-s corresponds to the SEQ that the primers are meant to amplify, -t is the entire parent sequence to check for off-targets

primers score ACGATCGATCAGCTAG CCCGGCTTTACAACTA -s ACGATCGATCAGCTAGTAGTTGTAAAGCCGGG -t GACAGCATCACATCTACGATCAGCTACGAGC
[{
  "tm": 56.0,
  "tmTotal": 65.0,
  "tmDiff": 3.0,
  "dg": -6.0,
  "len": 16,
  "offtargets" 0,
  "penalty": 5.6
},
{
...
}]

I think that may be more useful that just the score because you'll be able to take the tm, tmDiff, dg, len, and offtargets and do some modelling to correlate each with the success of PCR?

jcollins6 commented 1 year ago

Ya having all of those is helpful, thanks! One thing that I would change is to make the sequence that is meant to be amplified (-s) optional. Is that possible?

Joe

jjti commented 1 year ago

Hey @jcollins6, I added a sub-command in the direction of what we'd talked about in a v0.5.0:

https://github.com/Lattice-Automation/primers#scoring-existing-primers

I hope you'll check it out and let me know what you think.

One thing that I would change is to make the sequence that is meant to be amplified (-s) optional

That had been my plan. I wound up not even supporting that for a first-pass as this feature. The reason is I'll have to search the sequence to find where it was (supposed) to bind first. And then use that to calculate tm vs tm_total. I still think I should do that, but didn't as a first-pass. -t (sequence to check for off-targets) is supported

primers score --json GGTCTCAATGAGACAATAGCACACAC GAAGACTTTCGTATGCTGACCTAG | jq
[
  {
    "seq": "GGTCTCAATGAGACAATAGCACACAC",
    "len": 26,
    "tm": 67,
    "tm_total": 67,
    "gc": 0.5,
    "dg": -1.86,
    "fwd": true,
    "off_target_count": 0,
    "scoring": {
      "penalty": 13.23,
      "penalty_tm": 5,
      "penalty_tm_diff": 2.5,
      "penalty_gc": 0,
      "penalty_len": 2,
      "penalty_dg": 3.73,
      "penalty_off_target": 0
    }
  },
  {
    "seq": "GAAGACTTTCGTATGCTGACCTAG",
    "len": 24,
    "tm": 64.5,
    "tm_total": 64.5,
    "gc": 0.5,
    "dg": 0,
    "fwd": false,
    "off_target_count": 0,
    "scoring": {
      "penalty": 6,
      "penalty_tm": 2.5,
      "penalty_tm_diff": 2.5,
      "penalty_gc": 0,
      "penalty_len": 1,
      "penalty_dg": 0,
      "penalty_off_target": 0
    }
  }
]
jjti commented 1 year ago

As an aside, my hope would be that you might publish or share some of the data from your analysis. It would help to get experimental support (or otherwise) for these parameters: https://github.com/Lattice-Automation/primers#scoring

IIRC they came from looking at other PCR tools. But lab data would of course be preferable.

jcollins6 commented 1 year ago

Hi Josh, this looks great so far thanks! One thing I am noticing is that the Tm doesn't seem right for primers where I am putting on extra bps (like for Gibson/GG) on the end of the annealing sequence. I checked some primers in Benchling, and I would guess you are calculating the Tm based only on the primer, not how it is annealing to the template. Is that right?

Best, Joe

jjti commented 1 year ago

@jcollins6 that's correct, the first pass isn't smart enough to figure out the tm vs tm_total. That's what I was (obliquely) referring to below:

That had been my plan. I wound up not even supporting that for a first-pass as this feature. The reason is I'll have to search the sequence to find where it was (supposed) to bind first. And then use that to calculate tm vs tm_total. I still think I should do that, but didn't as a first-pass. -t (sequence to check for off-targets) is supported

It's doable, and I'll let you know after I add it, I just didn't implement in the first pass

jjti commented 1 year ago

@jcollins6 primers score now returns a different tm and tm_total based on added sequence: https://github.com/Lattice-Automation/primers/releases/tag/v0.5.3

The approach is:

  1. find >= 10 bp binding sites between the primers and the template (3' end)
  2. expand those binding sites until the first mismatch
  3. the remainder of the primer, unbounded, is the "added" sequence that factors into tm_total but not tm
jcollins6 commented 1 year ago

hey Josh, thanks for the quick update! This looks pretty good so far. A couple things I've noticed so far:

  1. I get an error if the PCR goes around the origin of a plasmid. Runs successfully once I re-index the plasmid.
  2. The penalty scores seem very high for primers with extra bp added at the end (for Gibson/GG). I will try and test different float values.

Best, Joe

jjti commented 1 year ago

I get an error if the PCR goes around the origin of a plasmid. Runs successfully once I re-index the plasmid.

Do you mind sharing a minimal test case so I can reproduce? I'm not surprised it errors out (there's little handling of plasmid zero-indexes right now), but I can use it as a regression test with a patch.

The penalty scores seem very high for primers with extra bp added at the end (for Gibson/GG). I will try and test different float values.

Side note: tm_penalty is only for the annealing portion (right now): https://github.com/Lattice-Automation/primers/blob/master/primers/primers.py#L143. So if the primers were designed with the total tm in mind, primers will still penalize them heavily for a small tm for the annealing portion. The README has an example of that: https://github.com/Lattice-Automation/primers#scoring-existing-primers

I don't know that that's why the primers are scored poorly, but it's my guess. Please let me know if there are other penalty factors that seem unexpected.

jcollins6 commented 1 year ago

here is an example using the pUC19 plasmid:

% primers score cccgtagaaaagatcaaaggatcttc gtaaaaaggccgcgttgctg -s gagatacctacagcgtgagctatgagaaagcgccacgcttcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcctggccttttgctggccttttgctcacatgttctttcctgcgttatcccctgattctgtggataaccgtattaccgcctttgagtgagctgataccgctcgccgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaagagcgcccaatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacacaggaaacagctatgaccatgattacgccaagcttgcatgcctgcaggtcgactctagaggatccccgggtaccgagctcgaattcactggccgtcgttttacaacgtcgtgactgggaaaaccctggcgttacccaacttaatcgccttgcagcacatccccctttcgccagctggcgtaatagcgaagaggcccgcaccgatcgcccttcccaacagttgcgcagcctgaatggcgaatggcgcctgatgcggtattttctccttacgcatctgtgcggtatttcacaccgcatatggtgcactctcagtacaatctgctctgatgccgcatagttaagccagccccgacacccgccaacacccgctgacgcgccctgacgggcttgtctgctcccggcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtcagaggttttcaccgtcatcaccgaaacgcgcgagacgaaagggcctcgtgatacgcctatttttataggttaatgtcatgataataatggtttcttagacgtcaggtggcacttttcggggaaatgtgcgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcctgtttttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacgagtgggttacatcgaactggatctcaacagcggtaagatccttgagagttttcgccccgaagaacgttttccaatgatgagcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatgacttggttgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaagagaattatgcagtgctgccataaccatgagtgataacactgcggccaacttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgatcgttgggaaccggagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaatggcaacaacgttgcgcaaactattaactggcgaactacttactctagcttcccggcaacaattaatagactggatggaggcggataaagttgcaggaccacttctgcgctcggcccttccggctggctggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcattgcagcactggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcaggcaactatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcattggtaactgtcagaccaagtttactcatatatactttagattgatttaaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgataatctcatgaccaaaatcccttaacgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcggtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagcagagcgcagataccaaatactgttcttctagtgtagccgtagttaggccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcgcagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaact Traceback (most recent call last): File "/opt/anaconda3/lib/python3.9/site-packages/primers/primers.py", line 329, in _binding_seq end_index_seq = seq.index(rev[:end_index_rev]) ValueError: substring not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/anaconda3/bin/primers", line 8, in sys.exit(run()) File "/opt/anaconda3/lib/python3.9/site-packages/primers/main.py", line 17, in run args.run(args) File "/opt/anaconda3/lib/python3.9/site-packages/primers/main.py", line 136, in run_score fwd, rev = score(seq_fwd, rev=seq_rev, seq=args.s, offtarget_check=args.t) File "/opt/anaconda3/lib/python3.9/site-packages/primers/primers.py", line 255, in score addfwd, , add_rev = _binding_seq(fwd, rev=rev, seq=seq) File "/opt/anaconda3/lib/python3.9/site-packages/primers/primers.py", line 340, in _binding_seq raise ValueError("failed to find rev binding site in seq: %s", err) ValueError: ('failed to find rev binding site in seq: %s', ValueError('substring not found'))

jjti commented 1 year ago

Hey @jcollins6 I'm going to try to get this tonight (better handling of zero-index). Has been busy at work (HCP) lately

jjti commented 1 year ago

I get an error if the PCR goes around the origin of a plasmid. Runs successfully once I re-index the plasmid.

@jcollins6 Just pushed another patch in 0.5.4 that addresses this, unit test here: https://github.com/Lattice-Automation/primers/blob/master/tests/primers_test.py#L182

$ primers score cccgtagaaaagatcaaaggatcttc gtaaaaaggccgcgttgctg -s gagatacctacagcgtgagctatgagaaagcgccacgcttcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcctggccttttgctggccttttgctcacatgttctttcctgcgttatcccctgattctgtggataaccgtattaccgcctttgagtgagctgataccgctcgccgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaagagcgcccaatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacacaggaaacagctatgaccatgattacgccaagcttgcatgcctgcaggtcgactctagaggatccccgggtaccgagctcgaattcactggccgtcgttttacaacgtcgtgactgggaaaaccctggcgttacccaacttaatcgccttgcagcacatccccctttcgccagctggcgtaatagcgaagaggcccgcaccgatcgcccttcccaacagttgcgcagcctgaatggcgaatggcgcctgatgcggtattttctccttacgcatctgtgcggtatttcacaccgcatatggtgcactctcagtacaatctgctctgatgccgcatagttaagccagccccgacacccgccaacacccgctgacgcgccctgacgggcttgtctgctcccggcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtcagaggttttcaccgtcatcaccgaaacgcgcgagacgaaagggcctcgtgatacgcctatttttataggttaatgtcatgataataatggtttcttagacgtcaggtggcacttttcggggaaatgtgcgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcctgtttttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacgagtgggttacatcgaactggatctcaacagcggtaagatccttgagagttttcgccccgaagaacgttttccaatgatgagcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatgacttggttgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaagagaattatgcagtgctgccataaccatgagtgataacactgcggccaacttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgatcgttgggaaccggagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaatggcaacaacgttgcgcaaactattaactggcgaactacttactctagcttcccggcaacaattaatagactggatggaggcggataaagttgcaggaccacttctgcgctcggcccttccggctggctggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcattgcagcactggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcaggcaactatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcattggtaactgtcagaccaagtttactcatatatactttagattgatttaaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgataatctcatgaccaaaatcccttaacgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcggtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagcagagcgcagataccaaatactgttcttctagtgtagccgtagttaggccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcgcagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaact --json | jq
[
  {
    "seq": "CCCGTAGAAAAGATCAAAGGATCTTC",
    "len": 26,
    "tm": 64.9,
    "tm_total": 64.9,
    "gc": 0.5,
    "dg": -3.2,
    "fwd": true,
    "off_target_count": 2,
    "scoring": {
      "penalty": 53.9,
      "penalty_tm": 2.9,
      "penalty_tm_diff": 2.6,
      "penalty_gc": 0,
      "penalty_len": 2,
      "penalty_dg": 6.4,
      "penalty_off_target": 40
    }
  },
  {
    "seq": "GTAAAAAGGCCGCGTTGCTG",
    "len": 20,
    "tm": 67.5,
    "tm_total": 67.5,
    "gc": 0.6,
    "dg": 0,
    "fwd": false,
    "off_target_count": 0,
    "scoring": {
      "penalty": 11.1,
      "penalty_tm": 5.5,
      "penalty_tm_diff": 2.6,
      "penalty_gc": 2,
      "penalty_len": 1,
      "penalty_dg": 0,
      "penalty_off_target": 0
    }
  }
]
jcollins6 commented 1 year ago

This worked on my end as well, thanks!