enzo-project / enzo-e

A version of Enzo designed for exascale and built on charm++.
Other
29 stars 35 forks source link

Plan to writing extra metadata for ``yt``: Any requests/suggestions? #352

Open mabruzzo opened 1 year ago

mabruzzo commented 1 year ago

Overview

I would like to improve the yt-frontend for Enzo-E. For example, it would be really nice to automatically:

  1. detect which fields are passively advected scalars
  2. detect/define species and thermodynamic fields

Basically, I’m thinking that we could introduce a hdf5-group called "description" (or "metadata" or something else?) to all output files (even those written as checkpoints) and then store extra metadata as attributes within that group. The idea is this info would be write-only (it would not be used when restarting a simulation).

Within this section, we might record the field-groups. This would help a lot with:

We could also write some physics-dependent metadata.

Purpose of this issue

Basically I'm opening this issue for 2 reasons:

  1. to solicit feedback from other people (especially people familiar with yt) to make suggestions about what other information to record within this section.

  2. to discuss the optimal approach for implementing this in the codebase in a manner that's portable across output approaches (since I'm not very familiar with the io infrastructure). I was thinking about maybe writing an IoMetaData class that we could subclass/customize in the enzo-layer... @jobordner do you think this seems viable? (maybe I could do something simpler and just register a callback somewhere).

matthewturk commented 1 year ago

Hi @mabruzzo, there are a few things that I think would improve the QOL for enzo-e in yt. We have a frontend that regards it as a block-structured index rather than patch-based, which @BolunThompson has led the work on, that is still being integrated. I think in general, more metadata is good and I'm eager to work with you on enumerating that list.

The other item that I think is much more invasive, which would also likely improve the performance of the yt frontend considerably, is to change the way the patches are stored. Two specific changes would be very helpful:

  1. Add a system that allows us more easily to identify (preferably without using string parsing) the positions of the patches.
  2. Store them as a single large dataset within each output file, so that rather than N datasets of size (P, Q, R) each, they are stored as one dataset for each field of shape either (N, P, Q, R) or (P, Q, R, N). (N here is the number of patches within that individual output file.)

I recognize that point 2 is likely intractable, but I wanted to put it out there anyway. For point 1, having a binary index that we could apply a uniform (or even tightly-looped) operation to that translates into, say, a Z-order index (with the bits pre-swizzled) would be very helpful as well.

mabruzzo commented 1 year ago

Thanks for responding @matthewturk. Any help would definitely be useful! Let me just quickly respond to your suggestions:

  1. Add a system that allows us more easily to identify (preferably without using string parsing) the positions of the patches.

This is definitely doable! Internally, the location of each block is tracked with a 96 bit index -- it tracks the block-level, the position of the current-block (or ancestor) on the root grid, and location relative to the parent on the root block (I'm a little fuzzy on how the internal representation maps to root levels). The string-name that we assign to each block is just a translation of this index.

We could easily store the position of a block in a more accessible manner. There's even a method we use for load-balancing that orders blocks in terms of their morton index. So we can probably just extract the logic from there.

  1. Store them as a single large dataset within each output file, so that rather than N datasets of size (P, Q, R) each, they are stored as one dataset for each field of shape either (N, P, Q, R) or (P, Q, R, N). (N here is the number of patches within that individual output file.)

I think this is something we can definitely work towards (but it would definitely involve some larger refactoring).

mabruzzo commented 1 year ago

@matthewturk and @BolunThompson - After thinking about this a little more, there is a small wrinkle that I would appreciate some clarification on.

All discussions I've seen of swizzled Z-order indices assume that the domain is divided into the same (power-of-two) number of sub-cells along each axis. However, non-cosmological Enzo-E simulations commonly don't satisfy this assumption.

For example, we could have a simulation with 16 root-blocks[^1] along the x-axis, 4 root-blocks along the y-axis, and 2 root-block along the z-axis. In this scenario, a root-block or it's descendants always require a different number of bits to encode the position along each axis. The position along the x-axis requires 2 more bits than the position along the y-axis and 3 more bits than the position along the z-axis.

From the perspective of the new frontend, what be the most useful thing for us to save in such a scenario?

[^1]: As you're probably aware Enzo-E divides the domain into an array-of-octrees. In other words, we have a root grid where the number of blocks along each dimension is a power-of-2. And then each of these root blocks can be refined as octrees.

matthewturk commented 1 year ago

That's an interesting point, and one I had not thought of. I suppose what I would reframe my discussion as would be to instead either have swizzled inside the individual octrees (rather than the array-of-octrees), or to potentially just include the individual 32 bit axial indices as their 32 bit numbers, rather than the string representation. If the former, then it would require two keys -- the index into the forest and then within that octree the z-order.

However, I will also just note that it's not necessarily the case that this is the problematic area, and I didn't mean to derail from the discussion about fields etc, which are likely much more easy to modify to improve QOL.