Need to rethink (or extend) how scream does IO on dyn grid

bartgol commented 2 years ago

Restart files require a Discontinuous Galerkin (DG) version of the dyn grid data (meaning that the corresponding edge dofs on bordering elements need to be separately saved), while right now, our IO on dyn grid assumes a Continuous Galerkin (CG) field (meaning that of the corresponding edge dofs on bordering elements, only one is saved). E.g., for output, we do a dyn_grid->phys_gll remap, and then write the phys_gll field (where the "remap" is simply "picking" one of the occurrences of each dof in the GLL grid).

Some thoughts below.

The SEGrids class already stores nelem np np dofs. The way we currently set it up in dynamics/homme/interface/dyn_grid_mod.F90, we replicate edge dofs gids, so that the global gids are still the same as in the corresponding unique PointGrid. However, I don't think we ever use this fact, so we could easily change SEGrid, to store nelem np np unique gids (I believe that's what Homme stores in elem(ie)%gdofP, so we can just use those).
The model restart files require to restart dyn grid field in a DG-like fashion, which means IO classes need to be able to handle SEGrid in a DG-like way. However, normal output might still be processed in the way we currently do (i.e., remap dyn->physGLL and then save, and viceversa for input). So we might want to handle both.
Scorpio IO classes assume that (partitioned) dofs have a layout of the form (ncols[,...]). For instance, for a 3d vector quantity, the layout is assumed to be (ncols,dim,nlevs). The natural dyn layout of a 3d field is (nelem,dim,np,np,nlev). We can either make IO handle the latter, or we can output/load the field as a "point-grid-like" field, with layout (nelem*np*np, dim, nlev). The latter is what EAM does, in a way. But we'd have to make the IO class switch based on a) grid type, and b) if SEGrid, whether to output a DG or CG field.

bartgol commented 2 years ago

@AaronDonahue @tcclevenger @jeff-cohere Tagging you all cause I'd like to hear any input you might have, given your involvement in AD design and expertise in software design.

My approach would be to

1) make SEGrid store nelem np np "unique" dofs gids 2) add switch to IO classes params to select whether one wants to save a DG or CG field 3) hack IO as much as needed for the DG case.

The hard part of 3 is that for vector fields like (nelem,2,np,np.nlev), the 2 vector components are strided across columns in memory. If the layout was (nelem,np,np,2.nlev), then it would be "less hacky" to collapse the first 3 extents, and I could just reinterpret_cast my way out. And before you ask "can we just use (nelem,np,np,2,nlev) for dyn grid fields?", the answer is "no", since that's not the layout that Homme uses (though I wondered in the past whether Homme could benefit from that switch, but that's a story for another day). So to do 3 I'd likely need some temporaries (which is not terrible, but that's where the "hack" part would come in).

PeterCaldwell commented 2 years ago

Would it be reasonable to say we'll do your approach for now but to add "investigate whether switching homme to (nelem,np,np,2,nlev) would be better" as a long-term goal? I don't really see a better short-term solution than what you've proposed.

bartgol commented 2 years ago

The "switching homme approac" is probably never gonna be doable. It would require quite a few changes, the biggest (and most painful) of which would be in the boundary exchange framework. I highly doubt we will ever do that, especially since we're not sure how much is there to gain (maybe a bit of caching on CPU, but likely nothing on GPU).

I think we'll go the route of "make temps to switch layout".

AaronDonahue commented 2 years ago

per

the 2 vector components are strided across columns in memory.

Does this mean that in homme subsequent elements are not contiguous in memory for vector components?

jeff-cohere commented 2 years ago

To @AaronDonahue :

Only one quantity in a multidimensional array can be contiguous in memory, and in our case, it's vertical levels. I hope I understood your question.

To @bartgol:

I agree with your approach. Storing degrees of freedom in the DG way within SEGrid handles both the DG and CG cases, and as long as we make clear what we are doing in the I/O subsystem to treat the more general DG-style grid as a CG-style grid (e.g. make sure the vector components on interfaces agree for the CG case), I think we can mitigate confusion.

bartgol commented 2 years ago

Re: homme layout. With a layout for, say, 2d velocity as (nelem,2,np,np,nlevs), for each element, you get all npnpnlev values for the 1st component, then all npnpnlev values for the second component. That means that you cannot contiguously subview [u,v] at a particular GLL point.

tcclevenger commented 2 years ago

Yeah I agree with the approach of storing as a DG field and having SEGrid interpret both CG and DG. It does seem a bit annoying that we can't use (nelem, np, np, dim, nlev) but the amount of temporary data shouldn't be too significant and I'll take you at your word that it's more work than it's worth.

So if we are switching from the DG to the CG case, how are we choosing which value to use at the interface? Is it arbitrary (like "take the value form the element with the smallest gid") or something like an average?

bartgol commented 2 years ago

Re: CG. Currently, we rely on Homme's internal. In particular, for a shared dof, Homme decides which elements "owns" it, by morally taking the 2d element with the smaller gid.

On the math level, it does not matter which one you take, since the fields are "continuous", but at the bfb level, the halo exchange does not yield bfb equal values at elements corners across bordering elements. That is, if dof X is on an element corner, and shared by 3+ elements (on cubed sphere meshes, it's either 3 or 4), then the 3+ copies of the field dof will have slightly different values, due to how the 3+ contributions are summed during the halo exchange (the 3+ elemenst will do the sum in different orders).

bartgol commented 2 years ago

I think #1251 took care of this. FYI, by default IO is now performed on the same grid as the fields, with dyn dofs being written in a DG fashion. If one wants to perform IO on a different grid (e.g., write dyn fields on the physics GLL grid), one needs to specify the grid on which to do IO, so that IO classes will perform a remap before/after writing/reading, to transfer data from/to the original grid to/from the IO grid:

Fields:
   Foo:
      IO Grid Name: Baz
      Fields Names: A, B
   Bar:
      Fields Names: C, D

would write fields C and D directly from grid "Bar", while fields A and B would be first remapped to grid "Baz", and then written to file. This is in case of Output, for Input the process would be in the opposite direction. Also, in order to automatically remap to/from the IO grid, a GridsManager needs to be provided, which must be able to create a remapper from/to the fields native grid to/from the IO grid.

E3SM-Project / scream

Need to rethink (or extend) how scream does IO on dyn grid #1298