bthuronyi / CloneCoordinate

CloneCoordinate issue tracking
1 stars 0 forks source link

Analytical PCR "recommended oligo" code needs to be made generic or removed from v1.0 CC #133

Open bthuronyi opened 1 month ago

bthuronyi commented 1 month ago

Analytical PCR tab currently has "recommended oligo" code (for 2 primer pairs) that works ok for our specific Golden Gate construct set but doesn't scale to other GG sets or handle GGs that don't have 8 parts of typical sizes.

The current algorithm hard-codes a response for 8-part GGs and it selects:

We currently store F and R primers for many (but possibly not all) GG parts. Those fields are optional and probably many CC users won't want to always provide an analytical PCR primer for every part, so we should account for situations where some parts don't have primers available.

A general approach will need to use information in Registry_part_long_short for each rID in the assembly and these considerations.

bthuronyi commented 1 month ago

Thinking through possible implementation considerations:

Testing this algorithm by hand: SSLSLL (n=6) > 1F 4R verifies 1,3,4 2F 6R verifies 2,3,5,6

Whoops - even though we verified all 6 parts, we didn't confirm that 1 connects to 6. That is, what we actually need to verify is the part-to-part junctions and their order, not the parts themselves. It's tricky to reframe the "verified" terminology this way -- we don't actually verify the junctions between short parts, even by using primers on each one, because there could be some short part in-between and we won't see it -- so instead just do a final check for some PCR going across the 6-1 junction.

Add to PCRs 2+ logic, IN ADDITION TO checking whether all parts are already verified:

Test for SSSSS (n=5) > 1F 5R verifies 1,5 2F 4R verifies 2,4 3F 1R verifies 3,1 wraparound criterion met -- done

For PCR 1 we can't use 1R because it might be a reverse complement of 1F, and 2R would give a very short band plus one that wraps around so it's not ideal. In fact we need to make sure we don't ever use the F and R primers from the same part. Actually using F and R from the same part is ok for long parts, but it also seems unimportant to allow it.

Here, we were saved from doing a too-short PCR by the odd number of total parts. What about SSSS (n=4)? 1F 4R verifies 1,4 2F 3R verifies 2,3 but is likely to be way too short (the primers are supposed to be at the edges of the parts, so these might be only a few bases apart).

We need a new criterion which is that F and R parts from adjacent numbers aren't used. This is bad even for long parts. So we would do: SSSS (n=4) > 1F 4R verifies 1,4 2F 1R verifies 2,1 (wraparound) 3F 2R verifies 3,2 (wraparound)

Looks good!

LLLL (n=4) > 1F 3R verifies 1,2,3 4F 2R verifies 4,2 wraparound Good!

LLLLSSSS (n=8) > 1F 3R verifies 1,2,3 4F 8R verifies 4,8 5F 7R verifies 5,7 -- is this ok? maybe too short... Should our criterion be about having >1 short part "inside" each PCR? 6F 1R -- I think, depending on how it's coded -- might also be 6F 2R, but either way 6 gets verified and we wrap around.

Open question: reject PCRs that have only 1 short part inside? Might not be too hard to do. We already need to check that the part numbers are not adjacent or the same, and in the case that the part numbers are separated by 1 part, we can check whether that part is short -- but need to handle wrapping.

bthuronyi commented 1 month ago

The above code seems doable so I think we should try to implement this general approach for v1.0.

bthuronyi commented 1 month ago

A trickier case is if some of the selected primers don't exist. We could just punt on that one: if we would be recommending a given primer but find that the Registry entry for it is empty, then just replace it with "Forward primer for part r### - not listed". If the user wants to use our algorithmic analytical PCR recommendation they can queue an appropriate primer and register it, and if not, they can do it by hand for that case.

bthuronyi commented 1 month ago

A more sophisticated approach if some primer(s) aren't available is to skip over those parts and keep going to the next choice you would make, but then you need to define a failure criterion where you can't design that PCR... and I'm afraid our overall algorithm as I designed it is too fragile to deal with that.

bthuronyi commented 1 month ago

"Parts verified" column would be automatically populated by formulas for each suggested PCR. If we don't implement automatic primer recommendations, we should make that column optional (blue header), give users write access, and make it a Named Range.

shen2333333 commented 1 month ago

Sorry it took too long for me to get to this, here's some of my thoughts.

Thinking about big picture a little bit, also refresh myself on my progress done in my senior year. Analytical PCR is part of the "build" that is like a QC for verifying the construction of the plasmid. The best way is obvioiusly do a sequencing, especially whole-genome sequencing using nanopore method (e.g. plasmidsaurus) for large plasmid like Golden-Gates with multiple parts. It's getting relatively cheap, $15 per plasmid but it's still a good idea to see if the purified plasmid from miniprep is at least likely to be the construct we want than just dump $60 for all 4 minipreps for example. That's where analytical PCR comes in.

Although we typically do 8 part golden gate because of the marburg system we are working with (correct me if I'm wrong), NEB said golden gate itself is possible for lots and lots more parts (up to 50+) as you know already. I don't think the goal of analytical PCR is necessarily verify every single part before we send them to sequencing, would be overkill. My feeling is that diminishing return will kick in really fast (see if you agree with me). For hypothetical example, for an 8-part construct, doing 1 PCR, and having it verified by gel, will give it 80% chance that entire plasmid is build correctly, doing a second one increase the chance to 90% and the third one to 99% for example. Then to balance # of PCR needed to be done to be confident enough for it to be a correct construct,

Central question is: Doing at least 1 PCR is probably a good idea for construct with lots of parts, but then how much PCR do we need to do to feel confident enough before we send it to sequencing.

Bunch of approaches I envisioned

  1. No work on our side. Remove recommended oligo. We left the user do everything. Select and/or design primers on parts of interest to increase confidence before send off to sequencing (if deemed necessary).
  2. Some work on our side; Suggest one (or two) PCR to do, then let the user decide if it's good enough to move on to sequencing, if not, the user will select/design further primer pairs to keep testing.
  3. Lots of work on ourside; the approach you suggested, that provides a general approach to determine minimum PCR needed to verify all the parts. I'm not sure if I have the bandwidth on my side, but certainly a interesting coding project for students that are interested. And even better, since we have some past data, we can get an idea of # of PCR that has been done and if sequencing are verified at the end, and maybe show a confidence metric, something like we are now 90% confident that the plasmid is correctly put together after the 2 primer pairs we tested is verified by gel.

My current feeling is leaning toward 1 or 2, but I don't mind 3 (I just thought it's a lot of work and I might not able to flush it out in a short period of time), it's a very interesting project though.

bthuronyi commented 1 month ago

I don't think the goal of analytical PCR is necessarily verify every single part before we send them to sequencing, would be overkill. My feeling is that diminishing return will kick in really fast (see if you agree with me). For hypothetical example, for an 8-part construct, doing 1 PCR, and having it verified by gel, will give it 80% chance that entire plasmid is build correctly, doing a second one increase the chance to 90% and the third one to 99% for example. Then to balance # of PCR needed to be done to be confident enough for it to be a correct construct,

Yeah, this is correct if analytical PCR is primarily used to increase the success rate of sequencing. One other thing people might use it for is to bypass sequencing and instead adopt an mID based on analytical PCR confirmation only. This could be helpful if there's a large number of constructs to make; PCR is quite easily scaled and gels can be somewhat easy to scale as well. It can also be faster than sequencing (though this is changing with nanopore availability) in terms of overall turnaround time.

bthuronyi commented 1 month ago

That said, with where v1.0 is at the moment, I think (2) is a good compromise for that release and if it seems like a lot of work we could go to (1). I would love to include approach (3) in v2.0!