OpenBioSim / biosimspace

An interoperable Python framework for biomolecular simulation.
https://biosimspace.openbiosim.org
GNU General Public License v3.0
87 stars 15 forks source link

[BUG] AMBER output parser gives wrong number of values in dof=2 #78

Closed xiki-tempula closed 1 year ago

xiki-tempula commented 1 year ago

Describe the bug Problem with parsing the amber output file. For the state where there is softcore region in both TI region 1 and TI region 2. The current parser gives wrong number of temperature in dof=2

To Reproduce The example output would be

| TI region  1

 NSTEP =      600   TIME(PS) =       1.200  TEMP(K) =   299.61  PRESS =     0.0
 Etot   =    -17683.1326  EKtot   =      3854.5497  EPtot      =    -21537.6823
 BOND   =         7.4462  ANGLE   =       101.3144  DIHED      =         3.5441
 1-4 NB =         7.4910  1-4 EEL =       -33.9679  VDWAALS    =      3115.5266
 EELEC  =    -24743.2707  EHBOND  =         0.0000  RESTRAINT  =         4.2340
 EAMBER (non-restraint)  =    -21541.9164
 EKCMT  =         0.0000  VIRIAL  =         0.0000  VOLUME     =     65483.6402
                                                    Density    =         0.9930
 TEMP0  =       298.0000  REPNUM  =              1  EXCHANGE#  =              1
 ------------------------------------------------------------------------------

  Softcore part of the system:       3 atoms,       TEMP(K)    =           0.00
 SC_Etot=         0.6102  SC_EKtot=         0.0000  SC_EPtot   =         0.6102
 SC_BOND=         0.0000  SC_ANGLE=         0.6090  SC_DIHED   =         0.0012
 SC_14NB=         0.0000  SC_14EEL=         0.0000  SC_VDW     =         0.0000
 SC_EEL =         0.0000
 SC_RES_DIST=     0.0000  SC_RES_ANG=       0.0000  SC_RES_TORS=         0.0000
 SC_EEL_DER=      0.0000  SC_VDW_DER=       0.0000  SC_DERIV   =         0.0000
 ------------------------------------------------------------------------------

| TI region  2

 NSTEP =      600   TIME(PS) =       1.200  TEMP(K) =   299.61  PRESS =     0.0
 Etot   =    -17683.1326  EKtot   =      3854.5497  EPtot      =    -21537.6823
 BOND   =         7.4462  ANGLE   =       101.3144  DIHED      =         3.5441
 1-4 NB =         7.4910  1-4 EEL =       -33.9679  VDWAALS    =      3115.5266
 EELEC  =    -24743.2707  EHBOND  =         0.0000  RESTRAINT  =         4.2340
 EAMBER (non-restraint)  =    -21541.9164
 EKCMT  =         0.0000  VIRIAL  =         0.0000  VOLUME     =     65483.6402
                                                    Density    =         0.9930
 TEMP0  =       298.0000  REPNUM  =              1  EXCHANGE#  =              1
 ------------------------------------------------------------------------------

  Softcore part of the system:       3 atoms,       TEMP(K)    =           0.00
 SC_Etot=         0.6102  SC_EKtot=         0.0000  SC_EPtot   =         0.6102
 SC_BOND=         0.0000  SC_ANGLE=         0.6090  SC_DIHED   =         0.0012
 SC_14NB=         0.0000  SC_14EEL=         0.0000  SC_VDW     =         0.0000
 SC_EEL =         0.0000
 SC_RES_DIST=     0.0000  SC_RES_ANG=       0.0000  SC_RES_TORS=         0.0000
 SC_EEL_DER=      0.0000  SC_VDW_DER=       0.0000  SC_DERIV   =         0.0000
 ------------------------------------------------------------------------------

If you replace https://github.com/OpenBioSim/biosimspace/blob/devel/tests/Sandpit/Exscientia/output/amber_fep.out with amber_fep.out.zip. The test will fail.

Expected behavior One might need two dof for the softcore region (please complete the following information):

Additional context Add any other context about the problem here.

lohedges commented 1 year ago

Thanks for reporting. I'll have a think since the current approach doesn't generalise well, so probably needs a complete rethink. I did question this during the previous PR since I was concerned that this would be a short-term fix. Since this only affects your sandpit, it's also possible for you to fix at your end and raise a PR. I'd rather not keep having to modify this region of the code without full context of the output formatting and type of simulations that are run.

lohedges commented 1 year ago

Looking at the output file I don't think the DoF approach makes sense. It is probably better to have the functions take a ti_region parameter (which is 0 for non-FEP simulations) then another flag that tells you whether you want a soft-core record from that region. The main issue is that there isn't full overlap between the normal records and those for the soft-core part, so it might be easier (although painful) to just create a whole bunch of new methods for the soft-core terms.

lohedges commented 1 year ago

I think I have thought of a simple fix. Will give it a quick try and report back.

lohedges commented 1 year ago

Okay, it's working, but this new output file that you have shared contains a lot of additional information that wasn't present in the others which is causing issues for the parse, e.g.:

...
 ------------------------------------------------------------------------
TI 1 EE14-SC           0.00000     1.00000     0.00000    -0.00000
TI 2 EE14-SC         -47.72422     0.00000     0.00000     0.00000
lambda = 0.000 : EE14-SC  H=       0.0000 dU/dL: L=    0.0000 NL=    0.0000 Tot=    0.00000
 ------------------------------------------------------------------------
TI 1 Self-Rec         -0.02504     1.00000     0.00000    -0.00000
TI 2 Self-Rec         -0.00000     0.00000     0.00000     0.00000
lambda = 0.000 : Self-Rec H=      -0.0250 dU/dL: L=    0.0000 NL=    0.0000 Tot=    0.00000
 ------------------------------------------------------------------------
TI 1 Self-SC          -0.00000     1.00000     0.00000    -0.00000
TI 2 Self-SC         -41.56967     0.00000     0.00000     0.00000
lambda = 0.000 : Self-SC  H=       0.0000 dU/dL: L=    0.0000 NL=    0.0000 Tot=    0.00000
 ------------------------------------------------------------------------
TI 1 vDW-Corr          0.00000     1.00000     0.00000    -0.00000
TI 2 vDW-Corr         -0.01409     0.00000     0.00000     0.00000
lambda = 0.000 : vDW-Corr H=       0.0000 dU/dL: L=    0.0000 NL=    0.0000 Tot=    0.00000
 ------------------------------------------------------------------------
lambda = 0.000 : Total dU/dl:    0.000000  L:    0.00000  NL:    0.00000  PI:    0.00000
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

MBAR Energy analysis:
Energy at 0.0000 =    -21439.240358
Energy at 0.0667 =    -21436.569044
Energy at 0.1333 =    -21420.047126
Energy at 0.2000 =    -21381.348536
Energy at 0.2667 =    -21317.294906
Energy at 0.3333 =    -21228.898916
Energy at 0.4000 =    -21120.365162
Energy at 0.4667 =    -20998.249282
Energy at 0.5333 =    -20870.529218
Energy at 0.6000 =    -20745.673330
...

Just working out a way to ignore this section of the output file, although I worry what other records might be present in future depending on what options are set and what type of simulation is run. This was the original reason why I used watchdog to automatically parse the mdout file, since the formatting was consistent.

xiki-tempula commented 1 year ago

Maybe only save the data that has a method for it? Instead of trying to save everything and something breaks down.