ejjordan / airflowHPC

2 stars 1 forks source link

Adding necessary functionalities to enable a minimal case of REXEE in airflowHPC #4

Open wehs7661 opened 7 months ago

wehs7661 commented 7 months ago

This issue summarizes functionalities to be added to the airflowHPC implementation of the REXEE method, mostly in replex.py. Generally, the goal is to transfer functions implemented in ensemble_md necessary to enable a minimal case of REXEE simulation in a practical context. Below I will classify changes to be made for different purposes ranked in terms of importance.

1. Proposed changes

1.1. To ensure a correct REXEE implementation

1.2. To enable a more efficient REXEE simulation

The function get_swaps only implements the single exchange proposal scheme. This proposal scheme was implemented in ensemble_md only for a sanity check on the REXEE implementation. As we will report in our preprint (coming soon!), the exhaustive exchange proposal scheme should be standard for efficient REXEE simulations. This requires incorporating parts of the code in get_swapping_pattern for the case when self.proposal is set to exhaustive.

1.3. To enable weight-updating REXEE simulations and weight combination schemes

Currently, no algorithms required for weight-updating REXEE simulations and weight combination have been implemented. I am not entirely sure if the airflowHPC implementation will eventually evolve into an implementation for the asynchronous REXEE method, but if so, or if weight-updating tests in airflowHPC are desired, the following functions in ensemble_md should be considered to be incorporated into airflowHPC. (Not everything in every function is necessary for a minimal case. We can discuss this further when we get to this point.)

1.4. To avoid failures in some edge cases

To avoid failures in some edge cases of REXEE simulations performed by airflowHPC, the following functions in ensemble_md should be incorporated into airflowHPC. Note that some edge cases are trivial and can be easily avoided by well-defined MDP/REXEE parameters.

1.5. To enable the multi-topology REXEE method

To enable coordinate modifications in the multi-topology REXEE method, this section in replica_exchange_EE.py and this section in run_REXEE.py should be incorporated.

2. Current progress

In my forked repository, I have created the branch add_functions, where I plan to push the most important changes proposed above (those listed in section 1.1.). As of now (commit b80aeae), I have only made the following minor changes to tasks.py and replex.py:

Here is a more detailed comparison between the commit b80aeae of my repo and the parent repo before being forked: diff.

I will make more changes (for the ones proposed in section 1.1. at least) and open a PR soon. I am also happy to help make other changes proposed above in airflowHPC depending on our future plan and discussions.

mrshirts commented 7 months ago

@wehs7661 do you want to indicate above which functions just involve internal logic (and thus relatively easy for anyone), and which would involve airflow functionality (which Joe might need to do?)

wehs7661 commented 7 months ago

@mrshirts as far as I can tell, all the changes proposed above should be independent of Airflow, at most to the extent where a @task decorator (or other similar Airflow decorators) is needed. I would propose to first focus on changes proposed in sections 1.1 and 1.2 for now as they are needed to run practical REXEE simulations.

One question as we transfer more and more functionalities from ensemble_md to airflow is that whether we would like to combine both implementations in one place that allows the user to choose which types of the REXEE methods to run (synchronous or asynchronous), or make the two implementations as similar as possible in terms of their algorithmic designs, variable names, usage, etc. Currently, the airflowHPC implementation uses slightly different variable names and data types in some places as compared to the ensemble_md implementation. We might want to discuss how to unify these implementations if the airflowHPC implementation will eventually evolve into an implementation of the asynchronous REXEE method.

ejjordan commented 7 months ago

1. Proposed changes

1.1. To ensure a correct REXEE implementation

1.2. To enable a more efficient REXEE simulation

I agree that implementing calc_prob_acc and accept_or_reject is a good task for @wehs7661 to try if interested. It should basically be as simple as copying the python functions and either calling them in get_swaps or making them tasks by copying the functions and making them airflow tasks. I believe this covers both section 1.1 and 1.2.

My function get_swaps should correspond to get_swapping_patterns up to the point where identify_swapable_pairs has been called (as you note in the last point of section 1.3.

A word of caution on trying to implement calc_prob_acc and accept_or_reject as tasks is that in airflow you cannot rely on using classes with internal states like in a normal python programming. In this sense it is much more like traditional parallel programming than traditional python programming.

1.3. To enable weight-updating REXEE simulations and weight combination schemes

All these functions should be straightforward to implement. This sounds like it is lower priority than 1.1/1.2 so we can wait until that is done to start on this.

1.4. To avoid failures in some edge cases

I have implemented a forwards and backwards mdp to json converter and a json schema for mdp files that can handle all the issues with modifying and validating mdp files (see here). It works but I am still testing and hope to finish in the next few days. The checking of gmx executable is handled by the dependence on the gmxapi python package.

2. Current progress

You are welcome to work in your own forked repo if you prefer @wehs7661 , but I think if you plan to contribute much code it will be better to just make PRs in my repo so that we can merge things when they are ready and not let branches diverge too much. On this front, if there are specific naming choices that I have made that you think could be improved please feel free to open a PR. It is hard to come up with names and my experience is that renaming suggestions are almost always an improvement.

  • Gather all import statements at the beginning of the scripts.

This is not a best practice because of the way that Airflow processes DAGs periodically to allow dynamic scheduling.

ejjordan commented 7 months ago

@wehs7661 I would kindly ask you to take a look at the changes I made in https://github.com/ejjordan/airflowHPC/pull/5

I made the tasks more generic based on my experience implementing some other test workflows. I will merge in a few days if I don't get any feedback as I think this is a much better starting point with clearer function calls. Now you can have a gmxapi task that looks like the following.

  grompp_result = run_gmxapi.override(task_id="grompp")(
      args=["grompp"],
      input_files={"-f": input_mdp, "-c": input_gro, "-p": input_top},
      output_files={"-o": "run.tpr"},
      output_dir=output_dir,
  )