ammarhakim / gkyl

This is the main source repo for the Gkeyll 2.0 code. Please see gkeyll.rtfd.io for details.
https://gkeyll.readthedocs.io/en/latest/
56 stars 15 forks source link

Replace adios I/O with adios2 #158

Closed manauref closed 10 months ago

manauref commented 10 months ago

Wrap the C API or ADIOS2, and replace previous I/O infrastructure to use this instead. Changes to both users and developers will be somewhat unnoticeable (e.g. postgkyl reads and plots adios2 files already). The major noticeable change is that now .bp files are actually folders

There are 3 ADIOS objects build on top of the ADIOS2 wrap:

This PR essentially replaced adios with adios2 "as is", meaning we didn't change the overall logic to our I/O system. However, in the future we should consider the following improvements:

  1. Instead of opening, writing and closing a file (now a folder) for every frame, we should just have one file that we open at the beginning of the sim, leave open for all frames, append to it in time, and close it at the end of the sim. We could separately create tools (though I think pgkyl can already do this) that extract a single frame and save it to another file in case a single frame is desired (e.g. to share with a colleague).
  2. Right now we are creating an adios io object for every file. Instead we could create a few io objects that get reused throughout the whole system (e.g. one io for distribution functions, one for gridDiagnostics and one for dynVectors).
  3. ADIOS2 C API can't or doesn't make it possible to know the length of a string attribute before reading it, so we had allocate a buffer big enough to hold them. This is particularly problematic for the inputfile copy embedded as an attribute. It'd be good to improve this, but doing it cleanly make require ADIOS2 folk to make some changes.
  4. Although gkyl and postgkyl don't have a problem with this, when we bpls one of our DynVector files that have multiple datasets in them, the datasets after the first don't show their dimension. Not sure how serious this is, but it'd be nice to fix it.

Regarding item 1, we could test this in a simple Unit test on thousands of processes before making any changes to the code itself.