SSAGESproject / SSAGES

Software Suite for Advanced General Ensemble Simulations
GNU General Public License v3.0
81 stars 28 forks source link

PBC issue in particle separation #22

Open AMPsUSC opened 3 years ago

AMPsUSC commented 3 years ago

I have been doing several tests with ABF&ANN + Particle separation as CV, with different number of walkers distributed in different regions of the expected PMF landscape (I have previously got the PMF from umbrella sampling). The representation of the CV as a function of time for each walker shows a lot of sharp jumps as typically appear when PBCs are not corrected (see image below). I guess the bias sampling in your implementation of these algorithms does not use the CV values written in the node-x files (default name of the CV per walker) but in that case the corrected values should be written in the files so that the user can effectively check the evolution of the CVs. Am I doing anything wrong? nodes

mquevill commented 3 years ago

Can you give some more context for your system? What is a basic explanation of your system? What CVs are you using? (Looks like you have 5 quantities plotted here - are you sampling on all of them?) Is your box size commensurate with these large jumps you are seeing (~2)?

SSAGES does follow the PBCs set in whatever engine you're using (GROMACS, LAMMPS, etc.). However, SSAGES does not have a concept of "molecules" and relies on the order of atoms passed to the CV to determine when to unwrap. There are certain cases where manually setting the PBCs for the CV calculation have solved some issues, but may not work for your specific case.

AMPsUSC commented 3 years ago

Sorry, I should have provided more details. In my system I have two relatively large molecules (~100-150 atoms each and 1-2 nm large in the largest dimension). I have a dodecahedron box with 9 nm in each dimension of the unit cell and about ~16000 water molecules. The box is large enough to separate the COM between the two molecules ~4 nm without seeing each other. I am using GROMACS 2018.3

As I said in my previous message, I am using 5 walkers for the sampling... so I guess each curve (the node-x files) corresponds to the value of the CV for each walker along the trajectories, isn't it? I checked the trajectories with a molecular viewer and everything looks right... except the jumps in the node-x files, that I believe come from a wrong correction of the PBCs

Since I am using the distance between the COM of both molecules as a CV for ABF or ANN, I would expect that the algorithm consider the correction through the PBCs to determine CV value. This should be easy just comparing the distance between COM to the box size... If the sampling is based on wrong distance values then all the results would be definitely wrong...

mquevill commented 3 years ago

Hmmm, I don't think that SSAGES is able to handle a dedecahedral box properly. Our current implementation unwraps PBCs assuming you have a rectangular box. There could definitely be issues in the disconnect between the dimensions in a rectangular box vs dodecahedral box, which you are seeing as these large jumps in CV values.

I would be curious to see if it works correctly with your system in a rectangular box. Since SSAGES cannot handle the dodecahedral geometry, it should detect this and throw a warning or an error. I plan to look more into this to see exactly which cases are or are not handled by SSAGES unwrapping scheme. Thank you for raising this issue!

AMPsUSC commented 3 years ago

Hi I have just seen the following line in ParticleSeparationCV.h Vector3 rij = snapshot.ApplyMinimumImage(com1 - com2).cwiseProduct(dim_.cast<double>()); so the correction is implemented... but as you say it works just for rectangular boxes. I usually work with dodecahedron boxes to save solvent... and I am new to SSAGES... so I just assumed it would work... but I see the problem. I will run this system with rectangular boxes... although it is a bit large...

mquevill commented 3 years ago

Ideally, SSAGES would have printed a warning or error message for you, instead of having to figure it out on your own. I'm definitely adding this to the developers' to-do list. Since we support various engines, it may be difficult to encompass every geometry, but at least we should tell the users about the current limitations of SSAGES (especially for new users).

mquevill commented 3 years ago

Actually, from diving further into the GROMACS documentation, it should convert dodecahedral and octahedral boxes into triclinic cells, internally. This would mean that SSAGES is actually unwrapping correctly, regardless of specified geometry. However, this is a place where our testing is lacking, and it would definitely help to test out some simple cases with different cell geometries in various engines.

Could you include some input files (SSAGES .json and GROMACS files) so that we can check through some of the other input parameters to make sure that parameters are chosen judiciously?

AMPsUSC commented 3 years ago

Here is one of my json files:

{
    "logger" :
    {
        "output_file" : 
                [
                                "node-0",
                                "node-1",
                                "node-2",
                                "node-3",
                                "node-4"
                ]
    },
    "walkers" : 5,
    "args" : ["-v","-deffnm","topol"],
    "CVs":
    [{
        "type" : "ParticleSeparation",
        "group1" : [5, 26, 47, 68, 89, 110, 131],
        "group2" : [148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165
, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188,
 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206]
    }],
    "methods" : [{
        "type" : "ABF",
        "cvs" : [0],
        "CV_lower_bounds" : [0.0005],
        "CV_upper_bounds" : [2.9],
        "CV_bins" : [400],
        "CV_restraint_minimums" : [0.0],
        "CV_restraint_maximums" : [3.0],
        "CV_restraint_spring_constants" : [0],
        "CV_isperiodic" : [false],
        "timestep" : 0.002,
        "minimum_count" : 200,
        "output_file" : "F_out",
        "output_frequency" : 1000,
        "unit_conversion" : 1,
        "frequency" : 1
    }]
}

I cannot publicly share my real systems but I could build a small sample and share it with you off the list. I could also perform some tests for you if that helps.

AMPsUSC commented 3 years ago

and this is one of the input files that I have been using with ANN for my tests:


{
    "logger" :
    {
        "output_file" : 
                [
                                "node-0",
                                "node-1",
                                "node-2",
                                "node-3",
                                "node-4"
                ]
    },
    "walkers" : 5,
    "args" : ["-v","-deffnm","topol"],
    "CVs":
    [{
        "type" : "ParticleSeparation",
        "group1" : [5, 26, 47, 68, 89, 110, 131],
        "group2" : [148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 
171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199
, 200, 201, 202, 203, 204, 205, 206]
    }],
    "methods" : [{
        "type" : "ANN",
                "topology" : [15],
                "nsweep" : 1000,
                "temperature" : 298.15,
                "grid" : {
                "lower" : [0.0005],
                "upper" : [2.9],
                "number_points" : [400],
                "periodic": [false]
                },
                "lower_bounds" : [0.0],
                "upper_bounds" : [3.0],
                "lower_bound_restraints" : [0],
                "upper_bound_restraints" : [0]
    }]
}
AMPsUSC commented 3 years ago

The correction of the PBCs seems to be correct at least in cubic boxes. The new plots look clean. Anyway, using the previous json files I see that most of my walkers remain stuck at almost fix positions. I guess I should modify something in the json input files to facilitate their diffusion throughout the CV... although the minimum_count in ABF and the nsweep in ANN are already quite small... any advice?

mquevill commented 3 years ago

As mentioned previously, GROMACS converts essentially all box geometry to triclinic cells to create the box matrix (H). We have not done extensive testing with triclinic cells (or dodecagons or octahedrons). But yes, the rectangular cell is very straightforward and is our typical test case.

With ABF, you could try reducing the minimum_count to maybe 100 or 50, but I would not go any lower than that. I think there have been possible other cases where users have seen some walkers getting "stuck" in different places on the free energy surface. We have not been able to track down exactly what is causing it.

I have not used the ANN method extensively. I think it is designed to be fairly agnostic to choices of parameters. I might suggest reducing the number of points in the grid somewhat, since that might be trying to get too fine of a resolution and might end up messing with the fitting of the network. In terms of the topology, usually one layer is sufficient when you're only sampling one CV, so 15 nodes in one layer should be able to sample the surface. You could try adding another layer, but there's a chance it would end up over fitting the system.

AMPsUSC commented 3 years ago

Thank you for the comments. I will follow your advices to see I am able to free my walkers...