OpenBioSim / biosimspace

An interoperable Python framework for biomolecular simulation.
https://biosimspace.openbiosim.org
GNU General Public License v3.0
74 stars 11 forks source link

[BUG] process.getRecords not getting the same number of values for different columns #52

Closed xiki-tempula closed 1 year ago

xiki-tempula commented 1 year ago

Describe the bug When I run energy_dict = process.getRecords(), I expect that for every key, there should be the same number of values.

To Reproduce

mol_1 = BSS.Parameters.openff_unconstrained_2_0_0("C").getMolecule()
mol_2 = BSS.Parameters.openff_unconstrained_2_0_0("CO").getMolecule()
merged = BSS.Align.merge(mol_1, mol_2)
solvated = BSS.Solvent.tip3p(merged, box=3 * [4 * BSS.Units.Length.nanometer])
protocol = BSS.Protocol.FreeEnergyMinimisation()
process = BSS.Process.Amber(solvated, protocol, exe=amber_gpu)
protocol.start()
process.wait()
new = process.getSystem()
protocol = BSS.Protocol.FreeEnergyEquilibration()
process = BSS.Process.Amber(new, protocol, exe=amber_gpu)
process.start()
process.wait()
energy_dict = process.getRecords()
for key in energy_dict:
    print(len(energy_dict[key]))

Gives

1002
1002
1503
1002
1503
1503
1503
1503
1503
1503
1002
1002
1002
1002
1002
1002
1002
501
501
501
501
501
501
501
1002
501

Expected behavior The number of values one get from each key should be the same.

lohedges commented 1 year ago

Could you possibly upload the stdout_file used by this process, so that I don't have to re-run the simulation, which I can't do locally? Alternatively, just call process.getOutput() to get after the process finishes to get a zip file with the output, which you can upload. I'm assuming that the output formatting is different for a free energy simulation, so the logic used for the parsing isn't working correctly.

xiki-tempula commented 1 year ago

amber_output.zip

lohedges commented 1 year ago

Many thanks. I'll try to take a look this afternoon.

lohedges commented 1 year ago

Okay, this is because the results are output at each time step for the different TI regions, e.g.:

| TI region  1

 NSTEP =        0   TIME(PS) =       0.000  TEMP(K) =   452.95  PRESS =     0.0
 Etot   =    -22038.2878  EKtot   =      5840.3473  EPtot      =    -27878.6351
 BOND   =      2220.2471  ANGLE   =         3.3530  DIHED      =         0.0000
 1-4 NB =         0.0000  1-4 EEL =         0.0000  VDWAALS    =      5504.2303
 EELEC  =    -35606.4655  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 DV/DL  =        34.5464
 ------------------------------------------------------------------------------

| TI region  2

 NSTEP =        0   TIME(PS) =       0.000  TEMP(K) =   452.86  PRESS =     0.0
 Etot   =    -22037.6791  EKtot   =      5840.9560  EPtot      =    -27878.6351
 BOND   =      2220.2471  ANGLE   =         3.3530  DIHED      =         0.0000
 1-4 NB =         0.0000  1-4 EEL =         0.0000  VDWAALS    =      5504.2303
 EELEC  =    -35606.4655  EHBOND  =         0.0000  RESTRAINT  =         0.0000
 DV/DL  =        34.5464
 ------------------------------------------------------------------------------

  Softcore part of the system:       1 atoms,       TEMP(K)    =        5097.79
 SC_Etot=        15.1955  SC_EKtot=        15.1954  SC_EPtot   =         0.0000
 SC_BOND=         0.0000  SC_ANGLE=         0.0000  SC_DIHED   =         0.0000
 SC_14NB=         0.0000  SC_14EEL=         0.0000  SC_VDW     =         0.0000
 SC_EEL =         0.0000
 SC_RES_DIST=     0.0000  SC_RES_ANG=       0.0000  SC_RES_TORS=         0.0000
 SC_EEL_DER=      0.0000  SC_VDW_DER=       0.0000  SC_DERIV   =         0.0000
 ------------------------------------------------------------------------------

I'll need to add some logic to parse these separately. This might be tricky, though, since I'd also need to figure out how to get back the data, e.g. when calling getTotalEnergy do we need to add an option to specify the TI region, perhaps defaulting to 1 if nothing is set.

Since I'm not an AMBER expert, do you have any sense for how the output will change if the FEP parameters are changed. I'm assuming there will always be 2 TI regions in this situation.

xiki-tempula commented 1 year ago

I think there will always be two TI regions for free energy claculations.

xiki-tempula commented 1 year ago

i think you might want this parameter at the getRecord level

lohedges commented 1 year ago

Thanks. I think I'll add some special handling for free energy protocols where I store two dictionaries, one for each TI region. If needed, I would also need to parse and store the information for the soft-core part and add methods to get back the records, since none currently exist.

lohedges commented 1 year ago

I think I now have this working locally. Could you possibly share output for a FreeEnergyMinimisation protocol? The AMBER output formatting is different for the regular Minimisation protocol and I have special handling to parse the records. I want to make sure that it still works for FreeEnergyMinimisation, and modify things if necessary.

xiki-tempula commented 1 year ago

em.zip