alchemistry / alchemlyb

the simple alchemistry library
https://alchemlyb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
189 stars 49 forks source link

AMBER dVdl convert to pandas error #272

Closed fr-0zt closed 1 year ago

fr-0zt commented 1 year ago

Hello all,

I'm having trouble forming my pandas data frame using the Amber parser in alchemlyb. I'm still new to python programming and I'm not sure where I might have done wrong in the python script. I really appreciate guidance on handling this error. I have also attached the code I'm using below.

Thanks.

from pathlib import Path

import pandas as pd

from alchemtest import Bunch
import alchemlyb
from alchemlyb.parsing import amber
from alchemlyb.estimators import TI

from alchemlyb.parsing.amber import extract_dHdl

#extract data

def load_ligandsolvated():
    """Load the Amber solvated dataset.

    Returns
    -------
    data : Bunch
        Dictionary-like object, the interesting attributes are:

        - 'data' : the data files by alchemical leg

    """

    module_path = Path(__file__).parent
    data = {'charge': list(map(str, module_path.glob('decharge/*/md7.out'))),
            'vdw': list(map(str, module_path.glob('vdw_bonded/*/md7.out')))}

    return Bunch(data=data)

dataset = load_ligandsolvated()

#print(dataset)

dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300)
                      for filename in dataset['data']['charge']])

Traceback (most recent call last): File "/home/srg/senal/Documents/Project-GQuad/MD/130mM/1-2-1/1-2-1Intercalated_Bottom/ReRun-2/TI/cuda/TI-3/ligands/ti_est.py", line 37, in dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300) File "/home/srg/senal/Documents/Project-GQuad/MD/130mM/1-2-1/1-2-1Intercalated_Bottom/ReRun-2/TI/cuda/TI-3/ligands/ti_est.py", line 37, in dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300) File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/init.py", line 11, in wrapper dataframe = func(outfile, T, *args, *kwargs) File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/amber.py", line 390, in extract_dHdl df = convert_to_pandas(file_datum) File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/amber.py", line 36, in convert_to_pandas frame_time = file_datum.t0 + (frame_index + 1) file_datum.dt * 1000 TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

xiki-tempula commented 1 year ago

Hi, Thanks for reporting this issue. Do you mind provide the input file so we could have a look? Thanks.

fr-0zt commented 1 year ago

md7.zip

I have attached one of my Amber MD output files above. I further did some more debugging on the "amber.py" code and I feel like extract section for DVDL and appending to filedatum.gradients might have some inconsistency with my amber output file. I may be wrong because this my first time debugging a complete python code. So, please let me know what you think.

Thank you!

xiki-tempula commented 1 year ago

@fr-0zt Thanks. I can reproduce the error. I wonder if you mind give a brief description of how you generate the file? Especially, how you set the frequency that amber reports the dhdl values, please. Thank you. @DrDomenicoMarson Many thanks for the AMBER contributions, do you mind have a check to see if there is a fix for this? Sorry, I'm not very familiar with the AMBER parser so felt that you could give us some help on this. Thanks.

fr-0zt commented 1 year ago

@xiki-tempula Thank you for looking in to this. I am performing a 10 ns simulation, saving the energies and dvdl values every 10 ps. However, according to Amber18 (and newer) there are two TI regions specified at the beginning of the simulation. So, dvdl values are saved for both the regions simultaneously at the said frequency. I have attached the first part of the energies output using this simulation.


  1. RESULTS

| TI region 1

NSTEP = 1000 TIME(PS) = 209920.999 TEMP(K) = 297.64 PRESS = 0.0 Etot = -20862.5455 EKtot = 7675.9463 EPtot = -28538.4918 BOND = 5407.7111 ANGLE = 90.6074 DIHED = 53.5559 1-4 NB = 18.9023 1-4 EEL = 0.0504 VDWAALS = 5812.5937 EELEC = -39921.9126 EHBOND = 0.0000 RESTRAINT = 0.0000 EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME = 84214.2071 Density = 1.0347 DV/DL = -208.5158

| TI region 2

NSTEP = 1000 TIME(PS) = 209920.999 TEMP(K) = 297.66 PRESS = 0.0 Etot = -20861.9237 EKtot = 7676.5681 EPtot = -28538.4918 BOND = 5407.7111 ANGLE = 90.6074 DIHED = 53.5559 1-4 NB = 18.9023 1-4 EEL = 0.0504 VDWAALS = 5812.5937 EELEC = -39921.9126 EHBOND = 0.0000 RESTRAINT = 0.0000 EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME = 84214.2071 Density = 1.0347 DV/DL = -208.5158

DrDomenicoMarson commented 1 year ago

Hi, I'll have a look at it right now! Just a quick comment about the "two TI regions specified at the beginning of the simulation".

It's true indeed that amber saves the two TI regions (and it did so also before amber18, as far as I remember), but if you look at the file, especially at the end of the file, where the averages over the last N time-steps are reported, DV/DL values (the only value read by alchemlyb) and their RMS fluctuations are perfectly equal between TI region 1 and 2.

If you search in the AMBER mailing list you can find some better explanation on why values are reported like that, but I can't remember where precisely.

PS: from your output file it seems you are printing values in the output file every 1 ps, and not 10 ps as you previously stated, as you have dt=1 fs and print every ntpr=1000 time-steps (it's not a problem for us, I just wanted to point it out just so you can be sure of what you are doing).

DrDomenicoMarson commented 1 year ago

@fr-0zt a minor issue: in the script you provided, you used T=300.0 when calling extract_dHdl, while your simulations were run at 298.0 K, and the parser gave a ValueError to me about it, didn't it happen to you as well?

DrDomenicoMarson commented 1 year ago

Hi I found the culprit in a "too strict" regex pattern, I addressed the issue in PR #273. Testing the updated version with your input file seems to run fine, extracting the right time and DV/DL values. Just wait for the PR to be merged in the master branch and everything should be fine!

xiki-tempula commented 1 year ago

The PR is merged to the master. Doing pip install git+ssh://git@github.com:alchemistry/alchemlyb.git should give you the alchemlyb with the fix. Please feel free to reopen this issue if there are other problems.