enzo-project / enzo-e

A version of Enzo designed for exascale and built on charm++.
Other
29 stars 35 forks source link

Method Initialization Improvements Discussion #418

Open mabruzzo opened 4 months ago

mabruzzo commented 4 months ago

Enzo-E's foundations and Cello's overall architecture are fantastic! It does a lot of things extremely well, has a very modular design and provides a lot of conveniences! (everything @jobordner has done is truly fantastic!) I've only come to appreciate the overall design more with time (especially now that I've spent some time working with other codes).

Enzo-E's Method objects are arguably one of its most defining features. They have a very logical structure (initialization happens in the constructor, Block-data is updated by compute, timestep any requirements are computed/reported by timestep) that is extremely approachable (and you generally know where to look). It provides extremely useful conveniences (like the refresh-machinery). Plus it's easy to switch the machinery on and off.

In my opinion, the one area that needs improvement is Method-initialization.

About Method Initialization

Currently, there is a one-to-one mapping between the Method objects initialized by Enzo-E and the contents of the parameter files. In more detail, the precise Method classes as well as their explicit execution order are specified in the parameter-file. The logic that actually initializes the methods is performed by the Problem class.

Before I go further, let me emphasize that this is a very sensible choice (I probably would have done the same thing)! Plus, it has definitely served us well! Some of the benefits from this approach include:

  1. What you see (in the parameter file) is what you get in the simulation. There are basically no surprises about the order of operations.

    For example, imagine Method:list is assigned ["order_morton", "balance", "mhd_vlct", "grackle"]. The control flow executed each cycle looks like

 --------------      ---------      ----------      ---------
|              |    |         |    |          |    |         |
| order_morton | -> | balance | -> | mhd_vlct | -> | grackle |
|              |    |         |    |          |    |         |
 --------------      ---------      ----------      ---------
  1. You are able to freely adjust the order of operations at runtime. This is extremely useful for debugging. It's very easy to insert a method to show/save state at any point.
  2. The logic in the codebase for initializing methods is extremely straight-forward.
  3. This initialization-logic is consistent with the initialization logic for other entities in the codebase. (And we happen)

Importantly, the challenges with this approach are not easy to foresee, and only really become problematic "at scale" (i.e. as more Method classes are introduced and more people designing the Method classes).

Challenges with the current approach

In recent years, the following factors/challenges have become apparent:

  1. A number of different actions/physics need to be implemented by separate a sequence of Method instances. When you have a long list of methods (8? 10?) it is easy to forget about them.
  2. Relatedly, there are all sorts of ordering requirements and conventions for different methods
    • "order_morton" (or "order_hilbert") should come before "balance"
    • "balance" can't occur at the end of a cycle. It should generally come near the start
    • "order_morton" (or "order_hilbert") should come before "check"
    • soon-to-be-added "frame_transform" should be the last thing
    • "gravity" related:
      • I think it should come after "pm_deposit" but before "pm_update"
      • should come before your hydro method. But I think "pm_update" comes after
      • I think it should come before "background_acceleration"
      • are there any ordering requirements related to "comoving_expansion"? Does it come after "pm_update"
      • when should particle-creation methods happen? (after "pm_update"?)
    • "m1_closure" should come after hydro, but before "grackle"
    • When should "feedback" occur in relation to "star_maker", hydro, "m1_closure"
    • When should "accretion" occur in relation to "sink_maker", hydro?
    • "flux_correct" must follow hydro
  3. The system was designed with the expectation that we would know all of the fields before initializing the first Method instance. While the system was modified a few years back to let you specify the required fields in constructors, the modified system doesn't work particularly well.
    • It presents a challenge for refresh lists. Since methods like "grackle" usually get initialized after the hydro-solver, the hydro-solver can't know all of the required passive-scalar fields that must be refreshed
    • A similar issue arises for the flux-correction (see issue #?)
    • A method might like to preallocate scratch-space in the constructor, but may need to wait to lazily initialize this space until the first compute or timestep invocation (from a code-design perspective, that's kinda ugly. In the abs, we want to encourage people not to mutate state after initialization)
    • The "output" method would also like to know all of the fields when it is first initialized to report errors early about desired output fields (rather than the first time it writes an output)
    • We could also facilitate some useful optimizations if we knew the fields when we initialize the "physics" objects
  4. Ensuring proper scheduling of related methods
    • ordinary scheduling (like "order_morton" and "balance")
    • subcycling sets of methods together (like "m1_closure" and "grackle")
  5. We want to inject certain functionality that only needs to be injected in certain cases. If using dual-energy formalism, should synchronize energy after a "flux_correct". To implement divergence cleaning, we would also need to inject a step after a "flux_correct".
  6. The system doesn't work well with higher-order time integration. It essentially assumes that all methods as a whole use forward-euler time integration. Individual methods are free to implement higher-order time integrate (e.g. see mhd_vlct), but that doesn't map well across methods. We could accomplish this Involves lots of extra steps
  7. Swapping out equivalent methods (e.g. ppm for mhd_vlct... )
  8. The need for coordinated/consistent parameters for different methods. We have worked around this with Physics objects, but may not always be best solution.
  9. Methods are not very composable.
    • There have been times where it would be great to call one method inside another (note: this is probably not necessary if other issues are addressed).
    • It would also be nice to group various source-term methods together

Importantly, most of the burden for points 1 and 2 falls on the end-user. And it's really easy to screw this up without getting any indication from Enzo-E. As the codebase grows, it will be impossible to comprehensively check for this!

How do we address this?

Honestly, I don't have all of the answers (I'm making this issue to solicit feedback). But, I have some ideas for the first few steps.

As a first step, I propose that we decouple Method initialization from the Problem class:

After that, I have a few ideas (to be tackled in subsequent PRs):

I very much welcome feedback. I have other ideas too.

The goal here is to pursue a solution somewhere between what Athena++ does for initializing tasks and what Enzo-E currently does. (I would prefer something a little more readable than Athena++)

What do we lose?

We lose some clarity about the order of methods?. We can work around this by passing a command line argument to Enzo-E that tells it to just print the methods.

We lose some "flexibility" of arbitrary reordering. In really, the only case where we currently have that kind of flexibility for any production sims is "output" dumps. In science runs, you probably want to standardize the locations (put @ start of cycle or end). This is only really a loss for scheduling debugging methods at runtime.

jobordner commented 2 months ago

I need to spend more time organizing and consolidating my thoughts on this, it's something that I've been concerned with for years. Here's a dump of my current thinking.

Cello implements an abstract "sequence of methods" for operating on data, and Enzo-E's operational behavior in a simulation is almost entirely defined in terms of the method list in the parameter file. Over the years, the combination of introducing non-physics operations into the method component (checkpoint, ordering blocks, load-balancing, flux-correction, etc.), and increasing software entropy, have increased the difficulty in specifying the method list.

I think many of the issues you mention above can be effectively dealt with by simplifying the organization of Enzo-E methods to address the technical debt. Things like merging "pm_deposit" into "gravity", renaming "pm_update" as "move_particles", having "check" and "balance" call "order" themselves as-needed, etc. In general making Enzo-E methods more "physics"- or "function"- oriented and less "algorithm"-centric.

It will necessarily break backward-compatability, so it would be a good time to introduce and use versioning parameters, and keep careful track of method modifications and the associated version.

After cleaning up Enzo-E methods, then we can shift attention to the Cello layer to address any remaining issues.