analysis TI with alchemical_analysis.py errors

fglaser commented 8 years ago

Dear Prof. Mobley,

I run a TI analysis with AMBER and I installed and I am trying to use your alchemical_analysis.py script for the example that comes with the with it, from the sampes/amber directory as it is. The scripts runs, but I get an error I don’t understand, probably my python ignorance, here it is the ouptut, it seems it starts to run, but it gives a mbar_lambdas error.

/Users/admin/Data/TOOLS/alchemical-analysis-master/samples/amber 03:24 PM Sun Dec 13$ ../../alchemical_analysis/alchemical_analysis.py -a AMBER -d data/* -m TI -p ti00* -q out

Started on Sun Dec 13 15:25:08 2015

Command line was: ../../alchemical_analysis/alchemical_analysis.py -a AMBER -d data/0.0 data/0.1 data/0.2 data/0.3 data/0.4 data/0.5 data/0.6 data/0.7 data/0.8 data/0.9 data/1.0 -m TI -p ti00* -q out

Loading in data from data/0.0/ti002.out... 50 data points, 10 DV/DL averages Loading in data from data/0.0/ti003.out... 50 data points, 10 DV/DL averages

Sorting input data by starting time and lambda Traceback (most recent call last): File "../../alchemical_analysis/alchemical_analysis.py", line 1224, in main() File "../../alchemical_analysis/alchemical_analysis.py", line 1172, in main nsnapshots, lv, dhdlt, u_klt = parser_amber.readDataAmber(P) File "/Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/parser_amber.py", line 582, in readDataAmber if len(dvdl_all) != len(mbar_lambdas): UnboundLocalError: local variable 'mbar_lambdas' referenced before assignment

If I use your script run.sh I get a different error:

./run.sh Traceback (most recent call last): File "../../alchemical_analysis//alchemical_analysis.py", line 1224, in main() File "../../alchemical_analysis//alchemical_analysis.py", line 1200, in main Deltaf_ij, dDeltaf_ij = estimatewithMBAR(u_kln, N_k, P.relative_tolerance, regular_estimate=True) File "../../alchemical_analysis//alchemical_analysis.py", line 290, in estimatewithMBAR O = MBAR.computeOverlap()[2] IndexError: invalid index to scalar variable. /Users/admin/Data/TOOLS/alchemical-analysis-master/samples/amber 03:41 PM Sun Dec 13$ more run.sh ../../alchemical_analysis//alchemical_analysis.py \ -a AMBER \ -d data \ -p '[01].*/ti00[2-9]' \ -q out \ -o . \ -r 5 \ -u kcal \ -s 0 \ -g \ -w \

run.log /Users/admin/Data/TOOLS/alchemical-analysis-master/samples/amber

Could you please help me understand what I am doing wrong?

Thanks a lot in advance,

Fabian

Dr. Fabian Glaser Head of the Structural Bioinformatics section

Bioinformatics Knowledge Unit - BKU The Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering Technion - Israel Institute of Technology, Haifa 32000, ISRAEL

fglaser at technion dot ac dot il Tel: +972 4 8293701 http://bku.technion.ac.il

fglaser commented 8 years ago

I forgot to mention that I am running this on my Mac El capitan with python 2.7.10

Thanks,

Fabian

halx commented 8 years ago

Have you run pmemd with ifmbar=0 (or not set this at all so it defaults to 0)? This is a bug in the AMBER parser. Please checkout the version from my branch: https://github.com/halx/alchemical-analysis.

fglaser commented 8 years ago

I didn't use ifmbar... should I run production again?

Or use the version you suggest?

Thanks!!

Fabian Technion, Israel

halx commented 8 years ago

Use my branch.

You may want to use ifmbar>0 as the computational cost is rather smalle. But this is only useful with equally spaced lambdas because AMBER currently does not allow you to use arbitrary lambdas.

fglaser commented 8 years ago

Thanks a lot for your kind help,

I did succeed with your branch, and it worked only for TI, now I don’t understand why the results are so different from the Kaus paper…. the output is below. And several questions…. sorry for so many….

First, only TI calculation worked, why?

Second there are two values of TI, -19.525 and -82.289 what is their difference?

Third, in any case those values are far far away from the published in the paper, with a similar procedure (although they did 5 ns and I did 3 ns production, and they did NVT production and I did NPT as you suggested).

They published

Model System Method DF0:0􀀀>1:0 DF0:1􀀀>0:9 DDFend
Solvation Free Energy TI -66.74 -49.33 -19.29
MBAR -65.62 -47.15 -18.47

And as you can see below my calculation yielded -82 …… The preparation of the system is identical I followed their procedure… so I really don’t know where the culprit is. I used the amber AM1-BCC method for charges and they used RESP, since I don’t have Gaussian, can this be the source of such a difference in TI?

Any help will be most welcomed.

Here is my output:

$ alchemical_analysis.py -a AMBER -d data/* -m all -p prod* -q out

Started on Mon Dec 14 12:01:15 2015

Command line was: /Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemical_analysis.py -a AMBER -d data/0 data/0.1 data/0.2 data/0.3 data/0.4 data/0.5 data/0.6 data/0.7 data/0.8 data/0.9 data/1 -m all -p prod* -q out

Loading in data from data/0/prod0.out... 100 data points, 1 DV/DL averages

Sorting input data by starting time and lambda

Note: BAR/MBAR results are not computed.

Skipping first 0 steps (= 0.000000 ps)

Note: extrapolating missing gradient for lambda = 1.0: -19.524776

The gradients from the correlated samples (kcal/mol): 0.00000 -19.525 1.00000 -19.525 TI = -19.525

The correlated DV/DL components from _everysingle step (kcal/mol): Lambda BOND ANGLE DIHED 1-4 NB 1-4 EEL VDWAALS EELEC RESTRAINT 0.00000 0.000 0.000 0.000 0.000 0.000 -0.245 -19.279 0.000 TI = 0.000 0.000 0.000 0.000 0.000 -0.245 -19.279 0.000

Number of correlated and uncorrelated samples:

State N N_k N/N_k

 0          100           62         1.61
 1          100          100         1.00

Estimating the free energy change with TI, TI-CUBIC, DEXP, IEXP, GINS, GDEL, UBAR, RBAR...

States TI (kJ/mol) TI-CUBIC (kJ/mol) DEXP (kJ/mol) IEXP (kJ/mol) GINS (kJ/mol) GDEL (kJ/mol) UBAR (kJ/mol) RBAR (kJ/mol)

0 -- 1 -82.289 +- 3.867 -82.289 +- 3.867 -0.000 +- 0.000 0.000 +- 0.000 -0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000

TOTAL:      -82.289  +-  3.867    -82.289  +-  3.867      0.000  +-  0.000      0.000  +-  0.000      0.000  +-  0.000      0.000  +-  0.000      0.000  +-  0.000      0.000  +-  0.000

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII The above table has been stored in IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII data/0/results.txt IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII while the full-precision data IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII (along with the simulation profile) in IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII data/0/results.pickle IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Time spent: 0 hours, 0 minutes, and 0.25 seconds. Finished on Mon Dec 14 12:01:16 2015 /Users/admin/Data/TUTORIALS/AMBER/A9/1HNN 12:01 PM Mon Dec 14$

THANKSS!!

Dr. Fabian Glaser Head of the Structural Bioinformatics section

Bioinformatics Knowledge Unit - BKU The Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering Technion - Israel Institute of Technology, Haifa 32000, ISRAEL

fglaser at technion dot ac dot il Tel: +972 4 8293701 http://bku.technion.ac.il

fglaser commented 8 years ago

All the results and input files and everything is here

https://www.dropbox.com/sh/nzs2t2zs1nh6mq1/AADK1vyHEVG3ECuAOoQFjFO3a?dl=0

If you want to take a look....

Thansk!!!

Fabian

halx commented 8 years ago

You will not get any results for the FEP family of analyses unless you run your simulations with ifmbar > 1 (and ensure that the lambdas generated with bar_l_min, bar_l_max and bar_l_incr match all your chosen clambdaS)

The first TI result is based on the correlated data set. The last one from the uncorrelated one and thus is the relevant one. In practice thought both should be fairly close to each other.

The analysis tool is a bit odd in how you specifiy how it should find the input files. I do not know what the directory layout of your files are (I suspect it is the same as in tool's sample directory) but what you probably want is something like -d data -p '[01]./prod' -q out (other variants are possible too). None of these command-line options takes more than one argument. If you want to read in more than one files then you will need to familiaries yourself with glob patterns. This is done as glob(datafile_directory/prefix*suffix) where datafile_directory is -d, prefix is -p and suffix is -q.

You have effectively only read in one single file. "Loading in data" tells you what file the parser is reading.

On 14 December 2015 at 10:21, fglaser notifications@github.com wrote:

Thanks a lot for your kind help,

I did succeed with your branch, and it worked only for TI, now I don’t understand why the results are so different from the Kaus paper…. the output is below. And several questions…. sorry for so many….

First, only TI calculation worked, why?

Second there are two values of TI, -19.525 and -82.289 what is their difference?

Third, in any case those values are far far away from the published in the paper, with a similar procedure (although they did 5 ns and I did 3 ns production, and they did NVT production and I did NPT as you suggested).

They published

Model System Method DF0:0􀀀>1:0 DF0:1􀀀>0:9 DDFend

Solvation Free Energy TI -66.74 -49.33 -19.29

MBAR -65.62 -47.15 -18.47

And as you can see below my calculation yielded -82 …… The preparation of the system is identical I followed their procedure… so I really don’t know where the culprit is. I used the amber AM1-BCC method for charges and they used RESP, since I don’t have Gaussian, can this be the source of such a difference in TI?

Any help will be most welcomed.

Here is my output:

$ alchemical_analysis.py -a AMBER -d data/* -m all -p prod* -q out

Started on Mon Dec 14 12:01:15 2015

Command line was: /Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemical_analysis.py -a AMBER -d data/0 data/0.1 data/0.2 data/0.3 data/0.4 data/0.5 data/0.6 data/0.7 data/0.8 data/0.9 data/1 -m all -p prod* -q out

Loading in data from data/0/prod0.out... 100 data points, 1 DV/DL averages

Sorting input data by starting time and lambda

Note: BAR/MBAR results are not computed.

Skipping first 0 steps (= 0.000000 ps)

Note: extrapolating missing gradient for lambda = 1.0: -19.524776

The gradients from the correlated samples (kcal/mol): 0.00000 -19.525 1.00000 -19.525 TI = -19.525

The correlated DV/DL components from _everysingle step (kcal/mol): Lambda BOND ANGLE DIHED 1-4 NB 1-4 EEL VDWAALS EELEC RESTRAINT 0.00000 0.000 0.000 0.000 0.000 0.000 -0.245 -19.279 0.000 TI = 0.000 0.000 0.000 0.000 0.000 -0.245 -19.279 0.000

Number of correlated and uncorrelated samples:

State N N_k N/N_k

0 100 62 1.61 1 100 100 1.00

Estimating the free energy change with TI, TI-CUBIC, DEXP, IEXP, GINS,

GDEL, UBAR, RBAR...

States TI (kJ/mol) TI-CUBIC (kJ/mol) DEXP (kJ/mol) IEXP (kJ/mol) GINS

(kJ/mol) GDEL (kJ/mol) UBAR (kJ/mol) RBAR (kJ/mol)

0 -- 1 -82.289 +- 3.867 -82.289 +- 3.867 -0.000 +- 0.000 0.000 +- 0.000

-0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000

TOTAL: -82.289 +- 3.867 -82.289 +- 3.867 0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII The above table has been stored in IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII data/0/results.txt IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII while the full-precision data IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII (along with the simulation profile) in IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII data/0/results.pickle

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Time spent: 0 hours, 0 minutes, and 0.25 seconds. Finished on Mon Dec 14 12:01:16 2015 /Users/admin/Data/TUTORIALS/AMBER/A9/1HNN 12:01 PM Mon Dec 14$

THANKSS!!

Dr. Fabian Glaser Head of the Structural Bioinformatics section

Bioinformatics Knowledge Unit - BKU The Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering Technion - Israel Institute of Technology, Haifa 32000, ISRAEL

fglaser at technion dot ac dot il Tel: +972 4 8293701 http://bku.technion.ac.il

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/45#issuecomment-164401082 .

fglaser commented 8 years ago

Hi,

Oh I see, this is much better now, thanks! Yes the structure of the dirs are as in the tutorial. Now the correlated IT value is almost perfect -67.201 vs -66.74 as in the paper, but the uncorrelated one (is this the second value ? TOTAL: -281.312 +- 2.176 ???) is completely different....

Any idea why they are so different?? And how can I correct my mistake? This is only a control run for me, once I get the right uncorrelated values I will work on my molecule.

Here is the full output, THanks!!

$$alchemicalanalysis.py -a AMBER -d data -m TI -p '[01]./prod_' -q out Started on Mon Dec 14 14:59:38 2015 Command line was: /Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemicalanalysis.py -a AMBER -d data -m TI -p [01]./prod_ -q out

Loading in data from data/0.1/prod0.1.out... 100 data points, 1 DV/DL averages Loading in data from data/0.2/prod0.2.out... 100 data points, 1 DV/DL averages Loading in data from data/0.3/prod0.3.out... 100 data points, 1 DV/DL averages Loading in data from data/0.4/prod0.4.out... 100 data points, 1 DV/DL averages Loading in data from data/0.5/prod0.5.out... 100 data points, 1 DV/DL averages Loading in data from data/0.6/prod0.6.out... 100 data points, 1 DV/DL averages Loading in data from data/0.7/prod0.7.out... 100 data points, 1 DV/DL averages Loading in data from data/0.8/prod0.8.out... 100 data points, 1 DV/DL averages Loading in data from data/0.9/prod0.9.out... 100 data points, 1 DV/DL averages

Sorting input data by starting time and lambda

Note: BAR/MBAR results are not computed.

Skipping first 0 steps (= 0.000000 ps)

Note: extrapolating missing gradient for lambda = 0.0: -15.816537 Note: extrapolating missing gradient for lambda = 1.0: -125.537551

The gradients from the correlated samples (kcal/mol): 0.00000 -15.817 0.10000 -24.593 0.20000 -32.826 0.30000 -44.704 0.40000 -56.149 0.50000 -61.926 0.60000 -65.991 0.70000 -77.821 0.80000 -103.103 0.90000 -134.217 1.00000 -125.538 TI = -67.201

The correlated DV/DL components from _everysingle step (kcal/mol): Lambda BOND ANGLE DIHED 1-4 NB 1-4 EEL VDWAALS EELEC RESTRAINT 0.10000 0.000 0.000 0.000 0.000 0.000 21.143 -45.736 0.000 0.20000 0.000 0.000 0.000 0.000 0.000 43.638 -76.464 0.000 0.30000 0.000 0.000 0.000 0.000 0.000 61.012 -105.717 0.000 0.40000 0.000 0.000 0.000 0.000 0.000 82.468 -138.618 0.000 0.50000 0.000 0.000 0.000 0.000 0.000 16.425 -78.352 0.000 0.60000 0.000 0.000 0.000 0.000 0.000 22.832 -88.823 0.000 0.70000 0.000 0.000 0.000 0.000 0.000 14.154 -91.975 0.000 0.80000 0.000 0.000 0.000 0.000 0.000 -1.906 -101.198 0.000 0.90000 0.000 0.000 0.000 0.000 0.000 2.224 -136.441 0.000 TI = 0.000 0.000 0.000 0.000 0.000 54.055 -121.256 0.000

Number of correlated and uncorrelated samples:

State N N_k N/N_k

 0          100          100         1.00
 1          100          100         1.00
 2          100           85         1.18
 3          100           57         1.76
 4          100          100         1.00
 5          100          100         1.00
 6          100           97         1.03
 7          100           66         1.52
 8          100          100         1.00
 9          100          100         1.00
10          100          100         1.00

Estimating the free energy change with TI...

States TI (kJ/mol)

0 -- 1 -8.454 +- 0.333 1 -- 2 -12.068 +- 0.506 2 -- 3 -16.160 +- 0.597 3 -- 4 -20.983 +- 0.569 4 -- 5 -24.701 +- 0.445 5 -- 6 -26.775 +- 0.432 6 -- 7 -30.216 +- 0.564 7 -- 8 -37.965 +- 0.551 8 -- 9 -49.648 +- 0.448 9 -- 10 -54.341 +- 0.337

TOTAL:     -281.312  +-  2.176

The above table has been stored in
data/results.txt
while the full-precision data
(along with the simulation profile) in data/results.pickle

Time spent: 0 hours, 0 minutes, and 1.02 seconds. Finished on Mon Dec 14 14:59:39 2015

fglaser commented 8 years ago

By the way I don't find any indication as if the value they published in their paper is correlated or uncorrelated data.....

Will this procedure work also for a larger molecule much more flexible like darunavir reasonably well?

Improving the Efficiency of Free Energy Calculations in the Amber Molecular Dynamics Package Joseph W. Kaus,*,† Levi T. Pierce,†,‡ Ross C. Walker,†,‡ and J. Andrew McCammon†,§,∥,⊥ †Department

fglaser commented 8 years ago

Oh, I saw the uncorrelated flag -n UNCORR, but using it does not produce any new output.... only the correlated lambda value....

Here it is

alchemicalanalysis.py --uncorr=UNCORR -a AMBER -d data -m TI -p '[01]./prod_' -q out Started on Mon Dec 14 15:45:44 2015 Command line was: /Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemicalanalysis.py --uncorr=UNCORR -a AMBER -d data -m TI -p [01]./prod_ -q out

Loading in data from data/0.1/prod0.1.out... 100 data points, 1 DV/DL averages Loading in data from data/0.2/prod0.2.out... 100 data points, 1 DV/DL averages Loading in data from data/0.3/prod0.3.out... 100 data points, 1 DV/DL averages Loading in data from data/0.4/prod0.4.out... 100 data points, 1 DV/DL averages Loading in data from data/0.5/prod0.5.out... 100 data points, 1 DV/DL averages Loading in data from data/0.6/prod0.6.out... 100 data points, 1 DV/DL averages Loading in data from data/0.7/prod0.7.out... 100 data points, 1 DV/DL averages Loading in data from data/0.8/prod0.8.out... 100 data points, 1 DV/DL averages Loading in data from data/0.9/prod0.9.out... 100 data points, 1 DV/DL averages

Sorting input data by starting time and lambda

Note: BAR/MBAR results are not computed.

Skipping first 0 steps (= 0.000000 ps)

Note: extrapolating missing gradient for lambda = 0.0: -15.816537 Note: extrapolating missing gradient for lambda = 1.0: -125.537551

The gradients from the correlated samples (kcal/mol): 0.00000 -15.817 0.10000 -24.593 0.20000 -32.826 0.30000 -44.704 0.40000 -56.149 0.50000 -61.926 0.60000 -65.991 0.70000 -77.821 0.80000 -103.103 0.90000 -134.217 1.00000 -125.538 TI = -67.201

The correlated DV/DL components from _everysingle step (kcal/mol): Lambda BOND ANGLE DIHED 1-4 NB 1-4 EEL VDWAALS EELEC RESTRAINT 0.10000 0.000 0.000 0.000 0.000 0.000 21.143 -45.736 0.000 0.20000 0.000 0.000 0.000 0.000 0.000 43.638 -76.464 0.000 0.30000 0.000 0.000 0.000 0.000 0.000 61.012 -105.717 0.000 0.40000 0.000 0.000 0.000 0.000 0.000 82.468 -138.618 0.000 0.50000 0.000 0.000 0.000 0.000 0.000 16.425 -78.352 0.000 0.60000 0.000 0.000 0.000 0.000 0.000 22.832 -88.823 0.000 0.70000 0.000 0.000 0.000 0.000 0.000 14.154 -91.975 0.000 0.80000 0.000 0.000 0.000 0.000 0.000 -1.906 -101.198 0.000 0.90000 0.000 0.000 0.000 0.000 0.000 2.224 -136.441 0.000 TI = 0.000 0.000 0.000 0.000 0.000 54.055 -121.256 0.000

Number of correlated and uncorrelated samples:

State N N_k N/N_k

 0          100          100         1.00
 1          100          100         1.00
 2          100           85         1.18
 3          100           57         1.76
 4          100          100         1.00
 5          100          100         1.00
 6          100           97         1.03
 7          100           66         1.52
 8          100          100         1.00
 9          100          100         1.00
10          100           99         1.01

Estimating the free energy change with TI...

States TI (kJ/mol)

0 -- 1 -8.454 +- 0.333 1 -- 2 -12.068 +- 0.506 2 -- 3 -16.160 +- 0.597 3 -- 4 -20.983 +- 0.569 4 -- 5 -24.701 +- 0.445 5 -- 6 -26.775 +- 0.432 6 -- 7 -30.216 +- 0.564 7 -- 8 -37.965 +- 0.551 8 -- 9 -49.648 +- 0.448 9 -- 10 -54.341 +- 0.337

TOTAL:     -281.312  +-  2.176

The above table has been stored in
data/results.txt
while the full-precision data
(along with the simulation profile) in data/results.pickle

Time spent: 0 hours, 0 minutes, and 0.43 seconds. Finished on Mon Dec 14 15:45:44 2015 /Users/admin/Data/TUTORIALS/AMBER/A9/1HNN

fglaser commented 8 years ago

Lastly, a few theoretical questions for my real example:

For a molecule like this one https://en.wikipedia.org/wiki/Ritonavir much larger and flexible that the one of the example I am using now, will this method provide a relatively accurate result? How many lambda jumps should I use 10, 20? How many ns of production? I used 3 ns per lambda in this example. Will 5 or 10 be enough for such a large molecule? Will this method work for a mixture of acetone and water?

Thanks a lot in advance,

Fabian

halx commented 8 years ago

Why don't you sample the end-points at lambda=0 and 1? Also, include TI-cubic in the analysis and look at the gradient plots (-g flag). Your data does not seem to very correlated so you can reduce ntpr. 100 data points only doesn't look like very much. Maybe you continue your simulation with this in mind and also compute the MBAR energies?

As for the molecules you are actually interested in, I guess this will be very challenging. Simulation time will depend on convergence and correlation. Lambda spacing depends on the steepness of the gradient (TI) or energy overlap (FEP). What are you after: relative or absolute free energies?

I think the -n flag does not apply to TI and you can safely ignore it.

On 14 December 2015 at 13:17, fglaser notifications@github.com wrote:

Hi,

Oh I see, this is much better now, thanks! Yes the structure of the dirs are as in the tutorial. Now the correlated IT value is almost perfect -67.201 vs -66.74 as in the paper, but the uncorrelated one (is this the second value ? TOTAL: -281.312 +- 2.176 ???) is completely different....

Any idea why they are so different?? And how can I correct my mistake? This is only a control run for me, once I get the right uncorrelated values I will work on my molecule.

Here is the full output, THanks!!

$$alchemicalanalysis.py -a AMBER -d data -m TI -p '[01]./prod_' -q out Started on Mon Dec 14 14:59:38 2015 Command line was: /Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemicalanalysis.py -a AMBER -d data -m TI -p [01]./prod_ -q out

Loading in data from data/0.1/prod0.1.out... 100 data points, 1 DV/DL averages Loading in data from data/0.2/prod0.2.out... 100 data points, 1 DV/DL averages Loading in data from data/0.3/prod0.3.out... 100 data points, 1 DV/DL averages Loading in data from data/0.4/prod0.4.out... 100 data points, 1 DV/DL averages Loading in data from data/0.5/prod0.5.out... 100 data points, 1 DV/DL averages Loading in data from data/0.6/prod0.6.out... 100 data points, 1 DV/DL averages Loading in data from data/0.7/prod0.7.out... 100 data points, 1 DV/DL averages Loading in data from data/0.8/prod0.8.out... 100 data points, 1 DV/DL averages Loading in data from data/0.9/prod0.9.out... 100 data points, 1 DV/DL averages

Sorting input data by starting time and lambda

Note: BAR/MBAR results are not computed.

Skipping first 0 steps (= 0.000000 ps)

Note: extrapolating missing gradient for lambda = 0.0: -15.816537 Note: extrapolating missing gradient for lambda = 1.0: -125.537551

The gradients from the correlated samples (kcal/mol): 0.00000 -15.817 0.10000 -24.593 0.20000 -32.826 0.30000 -44.704 0.40000 -56.149 0.50000 -61.926 0.60000 -65.991 0.70000 -77.821 0.80000 -103.103 0.90000 -134.217 1.00000 -125.538 TI = -67.201

The correlated DV/DL components from _everysingle step (kcal/mol): Lambda BOND ANGLE DIHED 1-4 NB 1-4 EEL VDWAALS EELEC RESTRAINT 0.10000 0.000 0.000 0.000 0.000 0.000 21.143 -45.736 0.000 0.20000 0.000 0.000 0.000 0.000 0.000 43.638 -76.464 0.000 0.30000 0.000 0.000 0.000 0.000 0.000 61.012 -105.717 0.000 0.40000 0.000 0.000 0.000 0.000 0.000 82.468 -138.618 0.000 0.50000 0.000 0.000 0.000 0.000 0.000 16.425 -78.352 0.000 0.60000 0.000 0.000 0.000 0.000 0.000 22.832 -88.823 0.000 0.70000 0.000 0.000 0.000 0.000 0.000 14.154 -91.975 0.000 0.80000 0.000 0.000 0.000 0.000 0.000 -1.906 -101.198 0.000 0.90000 0.000 0.000 0.000 0.000 0.000 2.224 -136.441 0.000 TI = 0.000 0.000 0.000 0.000 0.000 54.055 -121.256 0.000

Number of correlated and uncorrelated samples:

State N N_k N/N_k

0 100 100 1.00 1 100 100 1.00 2 100 85 1.18 3 100 57 1.76 4 100 100 1.00 5 100 100 1.00 6 100 97 1.03 7 100 66 1.52 8 100 100 1.00 9 100 100 1.00 10 100 100 1.00

Estimating the free energy change with TI...

States TI (kJ/mol)

0 -- 1 -8.454 +- 0.333 1 -- 2 -12.068 +- 0.506 2 -- 3 -16.160 +- 0.597 3 -- 4 -20.983 +- 0.569 4 -- 5 -24.701 +- 0.445 5 -- 6 -26.775 +- 0.432 6 -- 7 -30.216 +- 0.564 7 -- 8 -37.965 +- 0.551 8 -- 9 -49.648 +- 0.448

9 -- 10 -54.341 +- 0.337

TOTAL: -281.312 +- 2.176

The above table has been stored in

data/results.txt

while the full-precision data

(along with the simulation profile) in

data/results.pickle

Time spent: 0 hours, 0 minutes, and 1.02 seconds. Finished on Mon Dec 14 14:59:39 2015

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/45#issuecomment-164433562 .

fglaser commented 8 years ago

I tried, the TI is less similar to the paper with 0 and 1.... TI = -72.664 why is that? :-( and I get a new error, the figures are great tahnsk!!

The objective is to learn about the differences of solubility, any measure that will teach me what is more soluble and maybe a hint about why will be great about this two guys which are given together for HIV:

https://en.wikipedia.org/wiki/Ritonavir https://en.wikipedia.org/wiki/Darunavir

So I thought to calculate TI of hydration of each one and compare them, I have an idea of what should be the result.

Here is the result with 0 and 1:

alchemicalanalysis.py -a AMBER -d data -m TI-cubic -p '/prod_' -q out -g Started on Mon Dec 14 16:42:31 2015 Command line was: /Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemicalanalysis.py -a AMBER -d data -m TI-cubic -p /prod_ -q out -g

Loading in data from data/0/prod0.out... 100 data points, 1 DV/DL averages Loading in data from data/0.1/prod0.1.out... 100 data points, 1 DV/DL averages Loading in data from data/0.2/prod0.2.out... 100 data points, 1 DV/DL averages Loading in data from data/0.3/prod0.3.out... 100 data points, 1 DV/DL averages Loading in data from data/0.4/prod0.4.out... 100 data points, 1 DV/DL averages Loading in data from data/0.5/prod0.5.out... 100 data points, 1 DV/DL averages Loading in data from data/0.6/prod0.6.out... 100 data points, 1 DV/DL averages Loading in data from data/0.7/prod0.7.out... 100 data points, 1 DV/DL averages Loading in data from data/0.8/prod0.8.out... 100 data points, 1 DV/DL averages Loading in data from data/0.9/prod0.9.out... 100 data points, 1 DV/DL averages Loading in data from data/1/prod1.0.out... 100 data points, 1 DV/DL averages

Sorting input data by starting time and lambda

Note: BAR/MBAR results are not computed.

Skipping first 0 steps (= 0.000000 ps)

The gradients from the correlated samples (kcal/mol): 0.00000 -19.525 0.10000 -24.593 0.20000 -32.826 0.30000 -44.704 0.40000 -56.149 0.50000 -61.926 0.60000 -65.991 0.70000 -77.821 0.80000 -103.103 0.90000 -134.217 1.00000 -231.084 TI = -72.664

The correlated DV/DL components from _everysingle step (kcal/mol): Lambda BOND ANGLE DIHED 1-4 NB 1-4 EEL VDWAALS EELEC RESTRAINT 0.00000 0.000 0.000 0.000 0.000 0.000 -0.245 -19.279 0.000 0.10000 0.000 0.000 0.000 0.000 0.000 21.143 -45.736 0.000 0.20000 0.000 0.000 0.000 0.000 0.000 43.638 -76.464 0.000 0.30000 0.000 0.000 0.000 0.000 0.000 61.012 -105.717 0.000 0.40000 0.000 0.000 0.000 0.000 0.000 82.468 -138.618 0.000 0.50000 0.000 0.000 0.000 0.000 0.000 16.425 -78.352 0.000 0.60000 0.000 0.000 0.000 0.000 0.000 22.832 -88.823 0.000 0.70000 0.000 0.000 0.000 0.000 0.000 14.154 -91.975 0.000 0.80000 0.000 0.000 0.000 0.000 0.000 -1.906 -101.198 0.000 0.90000 0.000 0.000 0.000 0.000 0.000 2.224 -136.441 0.000 1.00000 0.000 0.000 0.000 0.000 0.000 24.306 -255.390 0.000 TI = 0.000 0.000 0.000 0.000 0.000 27.402 -100.066 0.000

Number of correlated and uncorrelated samples:

State N N_k N/N_k

 0          100           62         1.61
 1          100          100         1.00
 2          100           85         1.18
 3          100           57         1.76
 4          100          100         1.00
 5          100          100         1.00
 6          100           97         1.03
 7          100           66         1.52
 8          100          100         1.00
 9          100          100         1.00
10          100           90         1.11

Estimating the free energy change with TI-CUBIC...

States TI-CUBIC (kJ/mol)

0 -- 1 -9.199 +- 0.555 1 -- 2 -11.949 +- 0.575 2 -- 3 -16.071 +- 0.681 3 -- 4 -21.094 +- 0.651 4 -- 5 -24.903 +- 0.514 5 -- 6 -26.683 +- 0.500 6 -- 7 -29.701 +- 0.644 7 -- 8 -37.963 +- 0.630 8 -- 9 -48.203 +- 0.510 9 -- 10 -74.725 +- 0.720

TOTAL:     -300.490  +-  2.306

The above table has been stored in
data/results.txt
while the full-precision data
(along with the simulation profile) in data/results.pickle

Plotting the free energy breakdown figure... Plotting the TI figure... /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.py:4747: UserWarning: No labeled objects found. Use label='...' kwarg on individual plots. warnings.warn("No labeled objects found. " Traceback (most recent call last): File "/Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemical_analysis.py", line 1224, in main() File "/Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemical_analysis.py", line 1216, in main plotdFvsLambda() File "/Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemical_analysis.py", line 1010, in plotdFvsLambda plotTI() File "/Users/admin/Data/TOOLS/alchemical-analysis-master/alchemical_analysis/alchemical_analysis.py", line 999, in plotTI for l in lege.legendHandles: AttributeError: 'NoneType' object has no attribute 'legendHandles' /Users/admin/Data/TUTORIALS/AMBER/A9/1HNN

halx commented 8 years ago

As far as I can tell from the paper they have found a solvation free energy around -66 kcal/mol so you do not seem to be far off. What I find more worrisome is that the correlated and uncorrelated result is so different. Have you plotted your results and compared this with the output plots from the tool?

The error message may be some bug in the main code but it looks like it is only one of the plotting routines.

davidlmobley commented 8 years ago

Sorry for the delay getting involved in this. I'm tied up here grading finals but can start dealing with issues on this code in a day or two.

On Tue, Dec 15, 2015 at 7:37 AM, Hannes Loeffler notifications@github.com wrote:

As far as I can tell from the paper they have found a solvation free energy around -66 kcal/mol so you do not seem to be far off. What I find more worrisome is that the correlated and uncorrelated result is so different. Have you plotted your results and compared this with the output plots from the tool?

The error message may be some bug in the main code but it looks like it is only one of the plotting routines.

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/45#issuecomment-164800996 .

David Mobley Associate Professor Department of Pharmaceutical Sciences Department of Chemistry 3134B Natural Sciences I University of California, Irvine Irvine, CA 92697 dmobley@uci.edu work (949) 824-6383 cell (949) 385-2436

davidlmobley commented 8 years ago

OK, @fglaser , to respond to some of this which still seems to be unresolved:

Now the correlated IT value is almost perfect -67.201 vs -66.74 as in the paper, but the uncorrelated one (is this the second value ? TOTAL: -281.312 +- 2.176 ???) is completely different....

I think the major issue you're noticing here is a units one. Specifically, it appears @halx 's AMBER parser is printing out "correlated" values using kcal/mol but the total is in kJ/mol. I've created an issue to fix this so these are consistent. But you'll notice that 67.201*4.184 = 281.17, so you're in rather good shape here.

By the way I don't find any indication as if the value they published in their paper is correlated or uncorrelated data.....

Often people do NOT subsample at intervals of the autocorrelation time, which means they're using correlated data. This doesn't mean that it's correct to do so.

Will this procedure work also for a larger molecule much more flexible like darunavir reasonably well?

The key questions here are: a) What are the relevant timescales in your system, and b) How important are they for the property you are computing

i.e. if there are flexible rotatable bonds that impact the conformation which is important for the solvation free energy and are only adequately sampled on timescales of hundreds of nanoseconds, then you will either need to run simulations which are hundreds of nanonseconds long, or be tricky somehow. So whether it will work on darunavir in particular is something I can't really answer without knowing more about the molecule (i.e. knowing timescale for bond rotation, etc.).

For a molecule like this one https://en.wikipedia.org/wiki/Ritonavir much larger and flexible that the one of the example I am using now, will this method provide a relatively accurate result?

See above - depends on timescales. TI gives correct results in the limit of infinite sampling; the key question for any system is how fast you approach that limit. The more flexibility/the slower the timescales, the slower you approach that limit. Again I can't tell you anything in specific without knowing details of the timescales of your system. You might want to look at bond rotation timescales in your simulations for example, to get an idea of that.

How many lambda jumps should I use 10, 20? How many ns of production? I used 3 ns per lambda in this example.

Lambda spacing is not going to be as sensitive to the molecule, other than that for relatively nonpolar molecules or very small molecules one could get by with fewer lambda values. But in general if you have a lambda schedule which works for a reasonably-sized polar molecule it ought to work fairly well for most other cases you would want to look at, provided you sample enough.

Will 5 or 10 be enough for such a large molecule?

I expect not.

Will this method work for a mixture of acetone and water?

We've done some work recently treating solvents other than pure water, and yes, there's no methodological reason that won't work (it's worked well for us). You might want to see github.com/mobleylab/solvationtoolkit.

The objective is to learn about the differences of solubility, any measure that will teach me what is more >soluble and maybe a hint about why will be great about this two guys which are given together for HIV:

https://en.wikipedia.org/wiki/Ritonavir https://en.wikipedia.org/wiki/Darunavir

So I thought to calculate TI of hydration of each one and compare them, I have an idea of what should >be the result.

Do keep in mind that solubility is determined by other factors aside from the solvation free energy. In fact, ritonavir is famous for solubility problems that originated specifically from issues OTHER than the solvation free energy (several different crystal polymorphs of ritonavir are possible and these have dramatically different solubilities and dramatic effects on oral bioavailability; this actually resulted in ritonavir being withdrawn from the market at one point, i.e. see http://www.nature.com/nrd/journal/v3/n1/box/nrd1280_BX2.html).

I'm marking this issue as closed for now, as I don't see any actual problems with the tool that aren't already solved. But please re-open if you have further issues.

fglaser commented 8 years ago

Dear Prof. Mobley,

Thanks a lot for your informative answer, trully I don't have the details of rotation time scale of my system, but I think I'll give it a go. I know bioavailability is a complicated issue. I just want to learn about the process of the precipitation or desolvation or both, and since not much is known for this two guys and specially their combination, I think I maybe able to help my colleague. In any case I'll bug you from time to time with some questions. The main problem as I understand with those molecules is running time.

Now if I need to run 500 ns to get some approximate value is a problem, I guess I can try to use accelerated MD, which I never used... if you happen to know some additional bibliography yours or other's on this issue of studying the precipitation / desolvation of drugs into water or water / acetone mixtures, I'll be more than thankfully to get links.

Or any other suggestion that can help to understand the complicated process of nano precipitation of drugs in general with simulation.

At least I know know with yours and Hannes great help to run IT... It's a beginning.

Best wishes and happy new year,

Fabian Glaser PhD, Technion Israel

davidlmobley commented 8 years ago

In terms of references on solubility - you'll want to look at Mike Schnieders' (Iowa) work on this with the AMOEBA force field, and probably the work of Sanz and Vega. We're working on some actual solubility calculations ourselves but the work is not complete yet. We're also doing a lot on relative solubility and solvation in mixtures, though not in water/acetone at present.

On Mon, Dec 21, 2015 at 5:57 AM, fglaser notifications@github.com wrote:

Dear Prof. Mobley,

Thanks a lot for your informative answer, trully I don't have the details of rotation time scale of my system, but I think I'll give it a go. I know bioavailability is a complicated issue. I just want to learn about the process of the precipitation or desolvation or both, and since not much is known for this two guys and specially their combination, I think I maybe able to help my colleague. In any case I'll bug you from time to time with some questions. The main problem as I understand with those molecules is running time.

Now if I need to run 500 ns to get some approximate value is a problem, I guess I can try to use accelerated MD, which I never used... if you happen to know some additional bibliography yours or other's on this issue of studying the precipitation / desolvation of drugs into water or water / acetone mixtures, I'll be more than thankfully to get links.

Or any other suggestion that can help to understand the complicated process of nano precipitation of drugs in general with simulation.

At least I know know with yours and Hannes great help to run IT... It's a beginning.

Best wishes and happy new year,

Fabian Glaser PhD, Technion Israel

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/45#issuecomment-166310783 .

David Mobley dmobley@gmail.com 949-385-2436

MobleyLab / alchemical-analysis

analysis TI with alchemical_analysis.py errors #45

GDEL, UBAR, RBAR...

(kJ/mol) GDEL (kJ/mol) UBAR (kJ/mol) RBAR (kJ/mol)

-0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000 0.000 +- 0.000

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Estimating the free energy change with TI...

States TI (kJ/mol)

9 -- 10 -54.341 +- 0.337

data/results.pickle