Colvars / colvars

Collective variables library for molecular simulation and analysis programs
http://colvars.github.io/
GNU Lesser General Public License v3.0
206 stars 59 forks source link

[RFC] More band-aid fixes to Colvars #700

Open HanatoK opened 3 months ago

HanatoK commented 3 months ago

Well, I believe that no one would really like this PR. This PR uses a lot of hacks and workarounds in the backend to achieve reusable CVCs in SMP with limitations as less as possible (although I am not a big fan of distributing CVCs over threads), including:

  1. Build a toy AST and determine the parallelization scheme by the depth of the node. At first glance, colvardeps looks like an AST but after playing with it I feel it is not a real one, and it really just checks the dependencies of features, and there is no true AST. It would be better if Colvars could be redesigned with a true AST and a dependency checker for it. The dependency checker should not own the AST;
  2. Bypass the colvar class and take the cvc objects out to build the AST. I think that the colvar class should be completely removed;
  3. I don't know why the smp_lock and smp_unlock in colvarproxy_namd are implemented as creating and destroying locks, so I have changed them;
  4. Implement the chain rule in a dirty manner (see colvar::cvc::modify_children_cvcs_atom_gradients and propagate_colvar_force). When calling calc_gradients and apply_force of a CVC consisting of sub-CVCs, it now propagates the gradients and forces to all its sub-CVCs;
  5. To avoid race condition in propagating the atom gradients when reusing CVCs, I have to use smp_lock. However, it is very coarse-grained so I expect an additional performance penalty. I thought there should be a lock tied to each atom group but found none.

In summary, I think that Colvars should be fundamentally changed to achieve better support of reusable components and parallelization.

This PR tries to solve #232, extends #644 and finishes:

HanatoK commented 3 months ago

After some thoughts I feel there is no way to make explicit gradients working correctly with reusable components, so I will disable it in this PR.

HanatoK commented 3 months ago

The problem of calc_gradients()

Colvars was not designed with automatic differentiation (AD) in mind. At first glance, it seems to perform the forward AD because for each CVC, the calc_gradients() is executed just after calc_value, but the colvar class, supposed to be a function of CVC, does not have explicit gradients with respect to the atoms. Conversely, colvar::communicate_forces just computes the gradients with respect to CVCs on-the-fly, which seems to be a backward AD implementation. Furthermore, the colvarvalue class has no gradient field, which is opposed to many other implementations like torch.tensor and PLMD::Value, and the CVC class does not store the gradients, either. All these factors make either forward AD or backward AD using calc_gradients() for CVCs of sub-CVCs difficult. The apply_force() code path is less broken as it looks consistent with backward AD and the backward propagation in PLUMED.

The problem of colvardeps

From the perspective of compiler or interpreter, when constructing the CVC object in colvar::cvc::cvc, Colvars is still in the stage of syntax analysis, parsing the syntax of the config file and trying to build a tree of colvardeps, but it ought to be noted that before running cvc::init the syntax analysis is not done. The weird design is that init_dependencies is called after the constructor of CVC and before cvc::init and checks the dependencies. In general, feature dependencies are a semantic thing, and that means that Colvars interleaves the syntax analysis with the semantic analysis, which, in my opinion, is a bad design.

jhenin commented 3 months ago

The problem of colvardeps

From the perspective of compiler or interpreter, when constructing the CVC object in colvar::cvc::cvc, Colvars is still in the stage of syntax analysis, parsing the syntax of the config file and trying to build a tree of colvardeps, but it ought to be noted that before running cvc::init the syntax analysis is not done. The weird design is that init_dependencies is called after the constructor of CVC and before cvc::init and checks the dependencies. In general, feature dependencies are a semantic thing, and that means that Colvars interleaves the syntax analysis with the semantic analysis, which, in my opinion, is a bad design.

There seems to be a misunderstanding here. To clarify, the spirit of init_dependencies is not that it "checks dependencies" - rather, it initializes the static dependency tree between features. The proper dependency checking happens with calls to enable(), which happen either during the semantic analysis of the input, or throughout the run in case of dynamic dependencies.

HanatoK commented 3 months ago

The problem of colvardeps

From the perspective of compiler or interpreter, when constructing the CVC object in colvar::cvc::cvc, Colvars is still in the stage of syntax analysis, parsing the syntax of the config file and trying to build a tree of colvardeps, but it ought to be noted that before running cvc::init the syntax analysis is not done. The weird design is that init_dependencies is called after the constructor of CVC and before cvc::init and checks the dependencies. In general, feature dependencies are a semantic thing, and that means that Colvars interleaves the syntax analysis with the semantic analysis, which, in my opinion, is a bad design.

There seems to be a misunderstanding here. To clarify, the spirit of init_dependencies is not that it "checks dependencies" - rather, it initializes the static dependency tree between features. The proper dependency checking happens with calls to enable(), which happen either during the semantic analysis of the input, or throughout the run in case of dynamic dependencies.

Thanks for the clarifications. You are right that init_dependencies only declares the dependencies. However, I think that the problem is still calling enable that tries to check the dependencies while still initializing the children CVCs, so for CVC of sub-CVCs, I cannot do the checking here: https://github.com/Colvars/colvars/blob/ff35f9cffc491ee9b38762eef498a3ce5f5d836d/src/colvarcomp_combination.cpp#L58-L60 Also, I cannot use add_child in colvardeps to declare that a sub-CVC is a child of the other, as add_child also does dependency checking, so I think that it is a bad design to check dependencies while calling the initialization function. Calling the init means that Colvars is still parsing options, and the abstract syntax tree is not completely built, but the dependencies are semantic things that should be done after the AST is completely built.

In my opinion, the following structure should be separated from colvardeps, https://github.com/Colvars/colvars/blob/ff35f9cffc491ee9b38762eef498a3ce5f5d836d/src/colvardeps.h#L155-L162 to form the AST, and colvardeps should only be a feature-dependency checker, and not own the AST. Adding new children to the AST should not trigger the checker. In other words, it is better that colvardeps acts somewhat like an LLVM pass.

jhenin commented 2 months ago

Thanks @HanatoK , I now understand your point better and I fully agree!

To move the AST out of colvardeps and allow same-level dependencies, we need to deal with the fact that children objects of a CVC can be CVCs or atom groups, so those should be described by different parts of the feature tree.

jhenin commented 1 month ago

@HanatoK - this is a large and disruptive change set. I think there are items in there that we can agree on. Having an AST alongside the colvardeps class is one.

Your initial point 2 was: Bypass the colvar class and take the cvc objects out to build the AST. I think that the colvar class should be completely removed; The colvar class is the main workhorse at the moment, but I suppose you mean merging the cvc and colvar classes into one. I generally agree with that idea, although that will give a large class, where only a small subset of features will be used by most instances.

giacomofiorin commented 2 weeks ago

@jhenin's comment is good. There are multiple small fixes here that are definitely worth integrating, but they are mixed with others that pertain to broader design issues.

I also agree that the AST could be the most immediate goal. We had discussed in the past the possibility of adding such functionality to colvardeps, but it also looked like that would amount to feature creep for that class.

If you are aware of suitable classes in the C++ standard library, can you suggest them?

HanatoK commented 1 week ago

@jhenin's comment is good. There are multiple small fixes here that are definitely worth integrating, but they are mixed with others that pertain to broader design issues.

I also agree that the AST could be the most immediate goal. We had discussed in the past the possibility of adding such functionality to colvardeps, but it also looked like that would amount to feature creep for that class.

If you are aware of suitable classes in the C++ standard library, can you suggest them?

As far as I understand, the AST is just constructed as the same way as colvardeps that has pointers to children and parents nodes. colvardeps itself is actually an AST, but the issue is that it mixes up two goals, namely (i) determining the order of calculations of component and biases (the dependencies between colvardeps and its derived objects), and (ii) checking the feature-level compatibilities/dependencies. In other words, I think it is better to

I mark the PR as draft as it seems too disruptive and might need further discussions.

jhenin commented 1 week ago

@HanatoK I've now looked at this in more detail. This has a lot of merit. What it would need to be merge-ready is at least:

  1. Fewer TODOs and self-described hacks;
  2. passing regression tests (now I see some failures in the NAMD-based library tests);
  3. documentation of the user-facing changes.

One question: can this be done without implementing the AST design you described just above? If yes, I would rather separate these issues as much as possible and merge code that works to avoid letting this branch diverge too much.

Side note: I think you should be able to set your GH account so that the CI actions run in your fork, which would give us CI for this PR.

HanatoK commented 1 week ago
1. Fewer TODOs and self-described hacks;

Well, actually this PR was not intended to be merged into the master branch directly. Instead, I developed the code based on #644, generalized the idea of reusable CVCs and tried my best to solve the SMP issue.

2. passing regression tests (now I see some failures in the NAMD-based library tests);

This PR is based on #644 so I don't want it diverges from the reusable-cvcs branch. The NAMD tests fail because there is no devel branch from the NAMD's repository. reusable-cvcs should be rebased at first, and then I would rebase this PR.

3. documentation of the user-facing changes.

I tagged this PR with "request for comments" because I didn't think there would be a consensus about how to reuse CVCs, and I opened this PR mainly as a proof of concept to show that how to achieve reusable CVCs while retaining the idea of "distributing the CVCs over SMP threads". I will add the documentation if we have a consensus about reusability and parallelization.

One question: can this be done without implementing the AST design you described just above? If yes, I would rather separate these issues as much as possible and merge code that works to avoid letting this branch diverge too much.

I don't think so. It seems to me that @giacomofiorin likes the idea of moving some "special" CVCs out of the SMP loops and computing them serially. However, I think there will be more and more "special" CVCs relying on other CVCs and biases based on other biases in the future, so it is better to take a unified approach. As I have said in https://github.com/Colvars/colvars/pull/709#issuecomment-2372735138, there could be two approaches for parallelization, namely (i) distributing CVCs and biases over SMP threads, and (ii) using SMP threads to parallelize fine-grained loops like projecting hills, rotating atoms, calculating COMs and more. To achieve general reusability in (i), it has to use an AST or other similar things to determine the order of calculations of CVCs. Approach (ii) could be simpler as it only requires that the calculations of CVCs follow the order of their definitions appeared in the config file, which is what PLUMED does, although I suspect that building an AST could be more useful.

Side note: I think you should be able to set your GH account so that the CI actions run in your fork, which would give us CI for this PR.

It does run in my fork (see https://github.com/HanatoK/colvars/actions/runs/9766071847/job/26958200729).

giacomofiorin commented 1 week ago

I don't think so. It seems to me that @giacomofiorin likes the idea of moving some "special" CVCs out of the SMP loops and computing them serially. However, I think there will be more and more "special" CVCs relying on other CVCs and biases based on other biases in the future, so it is better to take a unified approach.

I totally agree with having a unified approach, which would be a win-win for users and developers alike. But that would take significant effort, during which we would need to keep master release-ready.

Although GROMACS and LAMMPS have somewhat predictable release schedules, they are also not in sync: one in the Winter, one in Summer. I guess this may be related to the times of the year when people prefer to be inside working in Stockholm vs. Albuquerque :-D And NAMD and VMD do not follow a regular schedule.

I do not see the two approaches as antithetical, either: (i) was just easier to implement than (ii), and there was also less pressure to implement (ii) in NAMD, which already provided at least centers of mass computed using domain decomposition.

HanatoK commented 1 week ago

I do not see the two approaches as antithetical, either: (i) was just easier to implement than (ii), and there was also less pressure to implement (ii) in NAMD, which already provided at least centers of mass computed using domain decomposition.

My impression is that parallelizing both the calculations of CVs and their inner loops (COM, COG, and rotating positions) could make the code much more complicated. Approach (i) only works well with CPUs, while approach (ii) could be easily ported to GPUs or similar accelerators. Calculating two CVs simultaneously on two "GPU threads" of the same GPU device could kill the performance.