Closed andresilvapimentel closed 2 years ago
I was trying to build a notebook using my multiple xvg files, however I did not get to convert the multiple xvg files (from each lambda) to a dataset similar to benzene_load How can I perform this task?
In my notebook, It works fine if I use the benzene_load dataset, but I did not get to convert my personal multiple xvg files (from each lambda) into a dataset to work with my notebook.
@andresilvapimentel Hi, thank you for your interest in alchemlyb we have an experimental workflow that should do the calculation automatically and it is in the branch of https://github.com/xiki-tempula/alchemlyb/tree/workf With an API like
>>> import os
>>> from alchemtest.gmx import load_ABFE
>>> from alchemlyb.workflows import ABFE
>>> # Obtain the path of the data
>>> dir = os.path.dirname(load_ABFE()['data']['complex'][0])
>>> print(dir)
'alchemtest/gmx/ABFE/complex'
>>> workflow = ABFE(units='kcal/mol', software='Gromacs', dir=dir,
>>> prefix='dhdl', suffix='xvg', T=298, out='./')
>>> workflow.run(skiptime=10, uncorr='dhdl', threshold=50,
>>> methods=('mbar', 'bar', 'ti'), overlap='O_MBAR.pdf',
>>> breakdown=True, forwrev=10)
Do you mind have a try and give me some feedback?
I got this error message:
ModuleNotFoundError Traceback (most recent call last)
ModuleNotFoundError: No module named 'alchemlyb.workflows'
I am not sure if you understood what I would like to do it. I just want to convert my personal multiple xvg files (from each lambda) into a dataset (like load_benzene) to work with my notebook. How can I perform this task?
Hi @andresilvapimentel, can you share the block of code you are running to load your XVG files, along with the exception/error you get? It's impossible for us to tell what the issue may be without this information.
HI @dotsdl , You asked exactly what I am asking for. I do not know how to load my xvg files. I the alchemlyb tutorial, the developers used:
from alchemtest.gmx import load_benzene from alchemlyb.parsing.gmx import extract_dHdl dataset = load_benzene() dataset
My question is how to change the line: dataset = load_benzene()
to upload my personal xvg files. So, my question is: How to convert my personal xvg files to a dataset?
Ah, now I see what you mean. I'll be the first to admit that we could use some clear examples in the documentation, but I believe you are referring to the example given in this section of the docs:
from alchemtest.gmx import load_benzene
from alchemlyb.parsing.gmx import extract_dHdl
dataset = load_benzene()
dhdl = extract_dHdl(dataset['data']['Coulomb'][0], 310)
dhdl.attrs['temperature']
dhdl.attrs['energy_unit']
If you run the dataset = load_benzene()
line and have a look at the dataset
object, you'll see that it's basically a dictionary, and dataset['data']
is a dictionary with two keys, 'Coulomb'
, and 'VDW'
. The values for those keys are a list of paths to the example XVG files for benzene for each lambda window.
In the extract_dHdl
line we parse the first such XVG file by passing its path in. So, to use your own XVG file, you need only pass your XVG file path to extract_dHdl
, along with the temperature at which your simulation was sampled.
Does this help?
Yes. It helped. Thanks.
I upload the xvg files with: from google.colab import files uploaded = files.upload()
Then, I alocated uploaded as dataset:
dataset = {'data':{'Coulomb':[],'VDW':[]}} for (k,v) in uploaded.items(): dataset['data']['Coulomb'].append(k) dataset['data']['VDW'].append(k)
Now, the issue is in the df line:
data_list = [extract_u_nk(xvg, T=310) for xvg in dataset['Coulomb']] df = forward_backward_convergence(data_list, 'mbar') ax = plot_convergence(df) ax.figure.savefig('dF_t.pdf')
The error message is:
ParameterError Traceback (most recent call last)
358 raise ParameterError( 359 'Warning: Should have \sum_n W_nk = 1. Actual column sum for state %d was %f. %d other columns have similar problems' % --> 360 (firstbad, column_sums[firstbad], np.sum(badcolumns))) 361 362 row_sums = np.sum(W * N_k, axis=1)
ParameterError: Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 18.391322. 41 other columns have similar problems
This error also shows up in the following lines:
mbar_coul.fit(u_nk_coul)
The other error message shows up when I run the command block:
dHdl_coul = alchemlyb.concat([extract_dHdl(xvg, T=310) for xvg in dataset['Coulomb']]) ti_coul = TI().fit(dHdl_coul) dHdl_vdw = alchemlyb.concat([extract_dHdl(xvg, T=310) for xvg in dataset['VDW']]) ti_vdw = TI().fit(dHdl_vdw)
from alchemlyb.visualisation import plot_ti_dhdl ax = plot_ti_dhdl([ti_coul, ti_vdw], labels=['Coul', 'VDW'], colors=['r', 'g']) ax.figure.savefig('dhdl_TI.pdf')
The error message is:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/alchemlyb/visualisation/ti_dhdl.py in plot_ti_dhdl(dhdl_data, labels, colors, units, ax) 95 raise ValueError( 96 'Length of labels ({}) should be the same as the number of data ({})'.format( ---> 97 len(labels), len(dhdl_list))) 98 99 if colors is None:
ValueError: Length of labels (2) should be the same as the number of data (4)
Do you know what is going on?
From what you shared, it looks like:
dataset = {'data':{'Coulomb':[],'VDW':[]}}
for (k,v) in uploaded.items():
dataset['data']['Coulomb'].append(k)
dataset['data']['VDW'].append(k)
data_list = [extract_u_nk(xvg, T=310) for xvg in dataset['Coulomb']]
should have instead for the last line:
data_list = [extract_u_nk(xvg, T=310) for xvg in dataset['data']['Coulomb']]
Does this fix your issue?
Hi @dotsdl Thank you for your suggestion.
No. I did not fix it. It is even worse because it gives the error message in the previous line:
KeyError Traceback (most recent call last)
KeyError: 'data'
Yesterday, I found a couple additional issues with this uploading process. I will try a couple more things before sharing with you.
Yes. It helped. Thanks.
I upload the xvg files with: from google.colab import files uploaded = files.upload()
Then, I alocated uploaded as dataset:
dataset = {'data':{'Coulomb':[],'VDW':[]}} for (k,v) in uploaded.items(): dataset['data']['Coulomb'].append(k) dataset['data']['VDW'].append(k)
Now, the issue is in the df line:
data_list = [extract_u_nk(xvg, T=310) for xvg in dataset['Coulomb']] df = forward_backward_convergence(data_list, 'mbar') ax = plot_convergence(df) ax.figure.savefig('dF_t.pdf')
The error message is: ParameterError Traceback (most recent call last) in () 5 #bz = dataset['data'] 6 data_list = [extract_u_nk(xvg, T=310) for xvg in dataset['Coulomb']] ----> 7 df = forward_backward_convergence(data_list, 'mbar') 8 ax = plot_convergence(df) 9 ax.figure.savefig('dF_t.pdf')
358 raise ParameterError( 359 'Warning: Should have \sum_n W_nk = 1. Actual column sum for state %d was %f. %d other columns have similar problems' % --> 360 (firstbad, column_sums[firstbad], np.sum(badcolumns))) 361 362 row_sums = np.sum(W * N_k, axis=1)
ParameterError: Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 18.391322. 41 other columns have similar problems
This error also shows up in the following lines:
mbar_coul.fit(u_nk_coul)
I think this could be solved by using autoMBAR instead of MBAR from alchemlyb.estimators import AutoMBAR as MBAR
The other error message shows up when I run the command block:
dHdl_coul = alchemlyb.concat([extract_dHdl(xvg, T=310) for xvg in dataset['Coulomb']]) ti_coul = TI().fit(dHdl_coul) dHdl_vdw = alchemlyb.concat([extract_dHdl(xvg, T=310) for xvg in dataset['VDW']]) ti_vdw = TI().fit(dHdl_vdw)
from alchemlyb.visualisation import plot_ti_dhdl ax = plot_ti_dhdl([ti_coul, ti_vdw], labels=['Coul', 'VDW'], colors=['r', 'g']) ax.figure.savefig('dhdl_TI.pdf')
The error message is:
ValueError Traceback (most recent call last) in () 10 11 from alchemlyb.visualisation import plot_ti_dhdl ---> 12 ax = plot_ti_dhdl([ti_coul, ti_vdw], labels=['Coul', 'VDW'], colors=['r', 'g']) 13 ax.figure.savefig('dhdl_TI.pdf')
/usr/local/lib/python3.7/dist-packages/alchemlyb/visualisation/ti_dhdl.py in plot_ti_dhdl(dhdl_data, labels, colors, units, ax) 95 raise ValueError( 96 'Length of labels ({}) should be the same as the number of data ({})'.format( ---> 97 len(labels), len(dhdl_list))) 98 99 if colors is None:
ValueError: Length of labels (2) should be the same as the number of data (4)
Do you know what is going on?
I think this is a tricky issue, the plot_ti_dhdl assumes a certain form of input data, I might need your file to see what is the best way forward.
Hi @xiki-tempula Thank you for your comment. I also think it is a trick issue. It seems it is related to the indexes of my dataset (and also the shapes of dhdl, dhdl_coul, dhdl_vdw, u_nk_coul and u_nk_vdw).
I also found that the data_list generated in my notebook is in a format different for what is required for the code works well. The command line: df = forward_backward_convergence(data_list, 'mbar') also gives the same convergence error message. So, the use of from alchemlyb.estimators import AutoMBAR as MBAR is not going to help.
I would like to suggest something. My recomendation is to work in a module to read the xvg files and generate a dataset with the right for of input data to avoid this issue.
I will work a little bit more. If I do not solve the issue, I will send my notebook and dataset to get further help. Which file do you need? The xvg files are big.
@andresilvapimentel So the alchemlyb is intended to function as a library instead of offering an end-to-end solution, so technically, all the components to do a convergence analysis are available. We can obviously giving advice on how to use them.
I tried to use from alchemlyb.estimators import AutoMBAR as MBAR but It did not work as well giving the same error message about convergence.
@andresilvapimentel If the AutoMBAR cannot provide a solution, I'm afraid that it is out of the scoop of this repo. Our AutoMBAR is just a wrapper which tries different MBAR solvers to get it to work. In most cases, the AutoMBAR can find the right solver to solve the MBAR but ultimately, it is the pymbar that is doing the solving. Maybe @dotsdl could consult with the pymbar people to see what is the best way forward.
@xiki-tempula How can I send the xvg files in order to help me with solution? What is the best way? Maybe, I am generating the xvg files different than you. Do you aggree with me?
@andresilvapimentel It is proprietary? If it is not, you could just put it in the comment section. Like this untitled folder.zip
@xiki-tempula It is my own xvg files: dhdl.zip
I would be very thankful if you can solve the issue.
@andresilvapimentel Ok, I guess you need the devel version, follow the guide in https://alchemlyb.readthedocs.io/en/latest/install.html#installing-from-source
I could get the result with the devel version by
import glob
from alchemlyb.parsing.gmx import extract_u_nk
from alchemlyb.convergence import forward_backward_convergence
u_nk_list = [extract_u_nk(file, 300) for file in glob.glob('*.xvg')]
df = forward_backward_convergence(u_nk_list, 'autombar')
Note that your case seems to be very difficult to solve, so the calculation will take a long time.
The u_nk_list used by you has the same form of the data_list = [extract_u_nk(xvg, T=310) for xvg in dataset['Coulomb']] that I used, so it gave the same error message using the mbar.
I did not use the autombar because I need to install it. I did not try yet.
It did not converge after 1 h when the devel version was intalled.
WARNING: Did not converge to within specified tolerance. max_delta = 6.668503e-09, tol = 1.000000e-12, maximum_iterations = 10000, iterations completed = 9999
As I did not solve the issue, I run the tutorial of the mdpow program using the benzene example to get the FEP data (from the xvg files). Then, I uploaded these xvg files into the alchemlyp notebook to try running this benzene example there. I got the following errors: 1) with the command df = forward_backward_convergence(data_list, 'mbar')
LinAlgError Traceback (most recent call last)
The benzene example for MDPOW is almost certainly not converged, the simulations are too short.
I would suggest you start out with a simple system where you know that you can get converged data. This can be benzene but you should run it for long enough (probably a few ns per window — but ultimately you should do some convergence analysis). Or use the GROMACS FEP tutorial. In any case, you should have a test case that you trust.
Manually calculating the free energy difference over the whole data set as you did in (2) and (3) is a good start. Try different estimators. Adjust convergence parameters. Then do the same manually including only half the data. See if anything breaks in between.
Ultimately, alchemlyb is not a plug-and-play solution. It provides building blocks and you need to develop an understanding of what the building blocks can do. I suggest reading recent papers on FEP calculations (Justin Lemkuhl's GROMACS tutorial (1) and the best practices paper (2).
@orbeckst @xiki-tempula , I've started working on moving testing different solutions within MBAR at https://github.com/choderalab/pymbar/pull/442. Would be good to get any comments on what you would suggest.
As a note: Google Colab is Python version 3.7.13 (default, Apr 24 2022, 01:04:09) (as of this writing). alchemlyb will likely drop Python 3.7 support soon (in line with NEP29, see e.g. PR #214 ) to keep up with dependencies such as Pandas (supports 3.8 – 3.10 in their latest release).
Therefore, unless Google updates Colab, it seems pointless to create tutorials in Colab if we cannot use the latest version of alchemlyb.
Minimum requirement for alchemlyb is 3.8 so it won't run in Colab until Google changes the version of Python.
Please re-open once Colab has been modernized.
It would be nice if the developers write an alchemlyb tutorial in google colab using a simple ligand/protein complex example.