Open BjornFJohansson opened 4 months ago
@BjornFJohansson @hiyama341 , it would be great if you could provide:
Nearest Neighbor Method explained: https://www.youtube.com/watch?v=jy5h9NwBEgk
Minimal example extracted from teemi. The results are identical to the web interface.
import json
import requests
# NEB example primers:
primers = """\
AGCGGATAACAATTTCACACAGGA
GTAAAACGACGGCCAGT
AGCGGATAACAATTTCAC
AGCGGATAAGGGCAATTTCAC
GTAAAACGACGGCCA""".splitlines()
conc=0.5
prodcode="q5-0"
url = "https://tmapi.neb.com/tm/batch"
for primer in primers:
seqpairs = [[primers]]
input_ = {"seqpairs": seqpairs, "conc": conc, "prodcode": prodcode}
headers = {"content-type": "application/json"}
res = requests.post(url, data=json.dumps(input_), headers=headers)
r = json.loads(res.content)
if r["success"]:
for row in r["data"]:
print(row["tm1"])
else:
print("request failed")
print(r["error"][0])
# Result
# 66
# 62
# 57
# 65
# 59
I had settings for Biopython's MeltingTemp to simulate Q5 and Phusion Tm calculations on the NEB website some time ago (at that time, calculations for Q5 and Phusion gave different results), but obviously they changed some settings in their calculator.
Yes, I get wildly different results right now. I would like to mimic the behaviour of the NEB calculator, the source code is not easy to follow though.
I had a look at the js code and it's compiled by babel, so I don't think it is possible to extract the source code to reverse-engineer without spending a lot of time. Some possible solutions to consider @BjornFJohansson @hiyama341:
They would both take some thinking, but they seem feasable, so it's a matter of how much is this slowing you down.
@hiyama341 I went ahead and made a first version using the mentioned approximation (first with the default pydna, then with the one from neb, and it's faster):
https://github.com/BjornFJohansson/pydna/compare/dev_bjorn...issue_237_optimisation
The file example.py compares the two approaches for finding a primer and you can see that the approximation is faster and uses fewer requests. This is what it prints. Let me know what you think.
Also, I could not use the teemi functions to make the request with "normal" headers. I was getting 403 response from the server. I had to mimmic a browser request to be able to use the API. Not sure if you encounter these problems. I could not even run their curl example. See the headers here.
making a request
making a request
making a request
making a request
making a request
result atgtcgtATGaaaccgttatcgatcatatgtGcgaaatgtcgcgcgtcatctacgtatcatcgatctactTAAacgtgta
--- 1.4502029418945312 seconds ---
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
result atgtcgtATGaaaccgttatcgatcatatgtGcgaaatgtcgcgcgtcatctacgtatcatcgatctactTAAacgtgta
--- 6.762638092041016 seconds ---
@manulera and @BjornFJohansson great work!
Yeah, two days ago I started getting the 403 response from the server. They must have changed something cause it was working just fine before. And thanks I'm gonna try to add the headers to teemi and see if it works.
Regarding the workload, maybe we can discuss on the 5th pydna meeting whether it is worth to reverse-engineer the primer calc function. Personally, I think it would be useful to have something fast and stable when we are calculating/making hundreds of primers but let's discuss next week.
I agree with @hiyama341 on this one. Localized Tm code would be very useful and I suspect would convince potential users.
I also agree that ideally we would want the code in the library and not rely on an external API that may cut access like they just did.
The issue is that it's not as simple as "translating the javascript code into python". The javascript code we can access from their website is bundled code. This is generated by javascript libraries like Babel, which basically make a single javascript file with all the code for the page, without dependencies on external files or libraries, and maximizing browser compatibility. This means that the source code that people in NEB wrote looks very different (and much more understandable) that what is in that file which is not meant to be read by humans, so reverse-engineering would take a lot of time and knowledge of javascript.
Why don't you use Biopython's MeltingTemp and find the correct settings to simulate NEB's TM calculator? Isn't Biopython a library that is likely to be used together with PyDNA?
Hi @MarkusPiotrowski NEB's calculator gives different values depending on the polymerase that will be used, and overall does a more complex calculation from what the others said. When I was doing PCR I was never too concerned with exact Tm calculations but it seems some people trust NEB's calculator over anything else and would exclude tools that use other Tm calculations.
@manulera I'm the author of Biopython's MeltingTemp. There are several options that are usually applied for calculating Tm's with the nearest neighbor thermodynamic method:
I would wonder if NEB uses a "more sophisticated" method and if their results can't be simulated with Biopython's MeltingTemp. I guess it's a question of finding the right parameters, and there are numerous possibilities for them. And of course they may have different parameters for different enzymes.
Their own help page gives some details (if they stick to them and is still up-to-date): https://tmcalculator.neb.com/#!/help E.g. they write that they use different salt correction method for Phusion.
@MarkusPiotrowski @manulera @hiyama341 Please have a look at this comparison Biopython vs NEB Seems like there is a ~7 degree difference for a large list of primers. I replicated the behaviour as best I could given the sometimes incomplete information given by NEB.
In this repo they somehow run the NEB Tm Calculator from Python.
@BjornFJohansson Thanks for the pointers tho these pages.
For Taq polymerase I now managed to simulate NEB's Tm calculator with Biopython's MeltingTemp. In NEB's calculator, they obviously do not correct for Mg ions. So the Mg2+ concentrations and dNTPs concentrations are irrelevant for their calculation. Thus, we need to remove them from the call to Tm_NN (when Mg is 0 [default], dNTPs are not considered). Then it fits nicely:
import Bio
from Bio.SeqUtils import MeltingTemp as mt
primers = (
'AGCGGATAACAATTTCACACAGGA',
'GTAAAACGACGGCCAGT',
'AGCGGATAAGGGCAATTTCAC',
'GTAAAACGACGGCCA',
)
NEBtms = (58, 54, 57, 50) # from NEB Tm calculator
for temp, primer in zip(NEBtms, primers):
tm = mt.Tm_NN(
primer,
nn_table=mt.DNA_NN3,
Na=0, # mM
K=50, # mM
Tris=10, # mM
# Mg=1.5,
dnac1=200, # nM
dnac2=0, # nM
dNTPs=0.8, # mM
saltcorr=6,
)
print(
f'Primer: {primer}, Tm: NEB: {temp}, Biopython: {round(tm)}'
f' delta: {round(tm - temp, 2)}'
)
Result:
Primer: AGCGGATAACAATTTCACACAGGA, Tm: NEB: 58, Biopython: 58 delta: -0.39
Primer: GTAAAACGACGGCCAGT, Tm: NEB: 54, Biopython: 54 delta: -0.04
Primer: AGCGGATAAGGGCAATTTCAC, Tm: NEB: 57, Biopython: 57 delta: -0.05
Primer: GTAAAACGACGGCCA, Tm: NEB: 50, Biopython: 50 delta: 0.36
The deltas are of course because NEB's calculator already gives rounded values.
As I said, one need to find the 'correct' settings. I'll go on to check for Phusion and Q5.
I can't find the composition of Phusions HF buffer, however, looking on the assay conditions and some assumption, I would suggest the following settings:
# For Phusion, use this settings
tm = mt.Tm_NN(
primer,
nn_table=mt.DNA_NN3,
Na=0, # mM
K=50, # mM
Tris=25, # mM
Mg=1.5,
dnac1=200, # nM
dnac2=0, # nM
dNTPs=0.8, # mM
saltcorr=1,
)
These give values that derive from the rounded NEB data by about 0.2 °C.
As written in their Help page, for Phusion they use a different salt correction method and, obviously, there they include Mg and dNTPs concentrations in their calculation.
These settings fit for Q5:
tm = mt.Tm_NN(
primer,
nn_table=mt.DNA_NN3,
Na=0, # mM
K=50, # mM
Tris=10, # mM
Mg=1.5,
dnac1=200, # nM
dnac2=0, # nM
dNTPs=0.8, # mM
saltcorr=6,
)
Thank you all for your contributions. Ill look into it asap.
Let me add a little bit to the discussion:
While I could provide settings to simulate quite well the Tm values of NEB's Tm calculator (see above, fine for Taq and Q5, some small deviation with Phusion sometimes), I wonder if this is a goal you want to go for. Why is NEB the source you want to trust? As already mentioned in the introduction by @BjornFJohansson , FisherScientifics' Tm calculator gives largely (3-4 °C) different results for Phusion than NEB's calculator. These calculators may also change (well, they changed in the past), you never know if they do the calculation correct (if you just want to simulate you may want to ignore this), and it's hard to find the correct parameters. And these may also change in the future.
When I rewrote Biopython's MeltingTemp code some years ago, I also compared the results to other calculators (mostly to check if I do the math correctly), but I focused on established academic tools, at that time this was MELTING and Primer3 or Primer3Plus. Online calculators from companies were hard to compare with because except some general statements ("using nearest neighbor.... according to...."), detailed information about the exact usage of the algorithm (which thermodynamic tables do they use, how do they calculate the initialization, which salt correction algorithm do they use) were sparse.
BTW, this source https://github.com/kalekundert/ligrna (mentioned by @BjornFJohansson ) seems to contain the pre-2016 code? (it mentions Breslauer, that is not mentioned in the recent pages).
Also, I browsed through the Javascript and while I can find the calculations and published constants, I don't find the actual buffer salt concentrations.
Hi @MarkusPiotrowski thanks for testing, it is very appreciated.
Elaborating a bit more on why we were trying to match NEB's results: As far as I understand it, the reason for wanting to reproduce the NEB's calculations is not so much to achieve the most "thermodynamically correct" results. Instead, from what @hiyama341 was saying, some researchers trust NEB over other calculators because they have been using it for a long time, so it's more an issue of "habit" rather than accuracy. Because of this, they would not trust/use a tool that uses other calculators and this could be an issue for some of the tools built on top of pydna that we are working on, namely teemi and ShareYourCloning.
@manulera Thank you for the explanation. You know your users better than I, so you can probably be convinced that they prefer NEB's calculator. Some years ago we had a Promega (now Thermo) freezer in the neighboring department, so we got our enzymes usually from them, and at that time I would have preferred to use their calculator.
What I would sum up from my testing and also by taking a look at the Javascript code from NEB's calculator: I don't see any magic in the JS code and it should be possible to use Biopython's MeltingTemp to simulate their results. Interestingly, the code contains much more thermodynamic tables and formulas than they claim to use (in fact, I wouldn't wonder if they use Biopython's MeltingTemp as template).
Above you have the settings that work well. Maybe some hints:
So I would suggest, to use a small wrapper around Biopython's Meltingtemp and supply your users with different predefined sets of settings, e.g., NEB_Taq, NEB_Phusion, NEB_Q5 etc.
@BjornFJohansson There is an error in https://github.com/BjornFJohansson/tm/blob/master/NEB%20Phusion/NEB%20Phusion.ipynb . You gave the dNTPs in µmolar instead of millimolar. Thus free Mg becomes 0 and ist not longer taken into consideration for calculations.
Thanks, Ill look into it. I am compiling results for the NEB Phusion which worked perfectly.
By the way @MarkusPiotrowski, we will meet this Thursday 25th of July at 14:00 UK time to discuss some issues of pydna, including this one. If you want to join the discussion, your insight would be appreciated! Below a Teams link for that:
Dear all, Some updated comparisons between neb calculator and the different combinations of biopython options. Specifically, I made a sum of least squares minimization as suggested using Scipy.minimize for the Q5 polymerase.
I get close, but not as close as with some of the others (thanks, @MarkusPiotrowski for the help).
There seems to be some creative mixtures of thermodynamic data in the neb tm calculator source code. The ones I could find are extracted to Python in the NEB_website folder.
Salt correction seems to be the same as biopython 7 (Owczarzy 2008) and not Owczarzy 2004 as documented.
Hello, I was working on #250 and I noticed that sometimes longer primers give lower Tms, is this normal?
from pydna.tm import tm_default
print(tm_default("agatcgactatctacttatgcatctta")) # 58.90561619071127
print(tm_default("agatcgactatctacttatgcatctt")) # 59.1327553388262
Hi all, I didn't mention this but I tried recreating this with brute force but couldn't get closer than Björn. Good idea to put it on the Hackathon list @manulera :)
As per the 4th Pydna meeting, it was generally agreed that Pydna and tools that depend on it would benefit from being able to simulate modern commercial Tm calculators offline.
I have compiled data around this here
The NEB.ipynb notebook contain attempts to replicate the NEB Tm calculator simply by selecting parameters in the biopython Tm module. This was successful for the standard Taq polymerase with the buffer composition stated by NEB.
I was not able to guess the result for the Q5 polymerase. A contributing factor is the secrecy of the buffer composition.
The NEB logics seems contained in the file
NEB/NEB_website/NEB Tm Calculator_files/main-3d92a74abb.js
I formatted the code with an online js formatter in the file
main-3d92a74abb_formatted.js
.The variables below seem important, but I was not able to follow the logics.
r.TmCalc.prototype.nnBr r.TmCalc.prototype.nnSa r.TmCalc.prototype.nnde r.TmCalc.prototype.nndhds r.TmCalc.prototype.nnloop r.TmCalc.prototype.nnbulge r.TmCalc.prototype.nntmm r.TmCalc.prototype.R=1.987, r.TmCalc.prototype.dSBr r.TmCalc.prototype.dSSa r.TmCalc.prototype.dHBr r.TmCalc.prototype.dHSa r.TmCalc.prototype.init r.TmCalc.prototype.saltCorrect r.TmCalc.prototype.setCt r.TmCalc.prototype.setMonosalt r.TmCalc.prototype.setDisalt r.TmCalc.prototype.setDMSO r.TmCalc.prototype.buildPairMap
Some other people have worked on similar issues here and here
I think this is worth spending some time, since interestingly, the Thermofisher Tm calculator here gives similar, but not identical results and claims to use similar but not identical thermodynamic data.