Implement adios2 - Githubissues

manauref commented 1 year ago

This PR requires some careful review & discussion. It replaces our previous use of ADIOS with a wrap of ADIOS2's C API.

The implementation of ADIOS2 is done as close to our previous implementation of ADIOS as possible. That is, if you run a simulation with the version of the code that used adios1 and with this new version, you produce the same files and the changes required throughout the app are relatively minimal. Some notable differences are:

ADIOS2 files appear to be folders.
Previously ADIOs was initialized in gkyl.cxx. We now remove that and initialize adios2 in PlasmaOnCartGrid (or in each unit test). Likewise we explicitly terminate adios2 in PlasmaOnCartGrid/unit tests.
It seems that to read string attributes I have to preallocate a string sufficiently big. Since I don't know how big to make this to read inputfile, I have disabled writing of input files as an attribute. This primarily affects regression tests and people who used this feature, but I think very few people do. Maybe @ammarhakim has some suggestions on how to handle this (I tried passing a void * as we did with ADIOS, but didn't work).
Because we only initialize one adios2 object (in PlasmaOnCartGrid) we need to pass that object to all CartFields and DynVectors one wishes to write. The minimal intervention strategy to do that for CartFields is to give it to the grids, and CartFields can retrieve it from there. But for DynVectors we have to pass the adios2 object to every DynVector we wish to write.

Despite these differences,

gkyl runregression run create
gkyl runregression run check
gkyl comparefiles
postgkyl all work as before.

Some possible future improvements to be discussed:

Presently each CartField and DynVector that performs I/O creates a buffer of the same size, effectively doubling our memory requirement + requiring a copy-to/copy-from buffer when doing I/O. As long as we open and close files every time we wish to do I/O, this seems unnecessary. This could easily be eliminated in AdiosCartFieldIo. But it's not so easy to eliminate in AdiosDynVectorIo because DynVectors are padded in order to make them work with base-1 index. We could get rid of this (i.e. remove pad and make DynVectors base-0 indexed) but it may require lots of changes throughout DynVector and elsewhere (not sure).
Looking at ADIOS2, and communicating with its developers, I have the impression that rather than writing a file for every frame it would be best to open a single file, leave it open, append to it in every frame, and close the file at the end of the sim. This is particularly important for ADIOS2 because there's more metadata so we are currently spending some unfortunate amount of time creating metadata for every file. Also, now that files are actually folders, it seems best to put all frames in those files/folders. If we make this change we could easily make the corresponding changes in pgkyl, and we could easily create a tool that extracts a single frame and creates a new file with it.
ADIOS2 has a GPU interface that allows you to pass a pointer to GPU memory directly. This would remove an explicit deviceToHost copy currently in AdiosCartFieldIo.

ammarhakim commented 1 year ago

I am closing this PR pending further review. We need to understand this better and it is best to close the PR and understand the issues better. In particular:

Why does one need to init ADIOS in PlasmaOnCartGrid? This was not needed and should not be needed AFAICT.
I do not understand the issue with string attributes. There must be some way to find the size of the attribute and then we can allocate that much data
I am not sure why the buffering is needed for arrays or dynvectors. Can't take the dynvector and to a simple pointer +1 to it and pass to the ADIOS methods?
I do not support 1 big file for all the IO. This is a very bad idea. First the file will be huge, unmanagable. Say you write a lot of files and only want to copy a few (even 1) do your local machine. How would you do it? Further, how can I tell someone that here take this file and restart from this file? I think this is a serious problem with the crappy way in which ADIOS is doing things. TBH the design seems seriously flawed. I know they moved to writing data in directories. This is also a bad idea, but I can live with it. We can't go to the single file model, though. It is senseless for a code like Gkeyll that can potentially write HUGE amounts of distribution functions in 6D.
Repeatedly writing small amounts of meta-data is file. I think the problem is ADIOS is writing HUGE amounts of meta-data (again due to a design flaw). In particular, they need to store which data belongs to what part of the bigger array and this could be a lot of data. However, it is still pretty less amount compared to array data. I do not think we need to presently optmize for this. We are not IO bound and are seriously compute bound. Let's not optmize for this.

This work is very good but let's close the PR till we figure the above out.

liangwang0734 commented 1 year ago

The default behavior of adios2 is folder based and the actual number of heavy files under the folder can be flexible (say, equal to the number of nodes). This makes the Adios2 interface more flexible and allows benefits of single file, such as changing the number of processors for restarting.

On Sun, Oct 22, 2023, 11:16 AM Ammar @.***> wrote:

Closed #150 https://github.com/ammarhakim/gkyl/pull/150.

— Reply to this email directly, view it on GitHub https://github.com/ammarhakim/gkyl/pull/150#event-10733480573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUIY6VERYI4CEH7J4BYY3YAU2ENAVCNFSM6AAAAAA6KFB66WVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJQG4ZTGNBYGA2TOMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ammarhakim commented 1 year ago

Changing the number of files for restarting was always possible even in Adios1. As I said above I ok with a directory instead of a file. That is not the problem here. We will meet to discuss issues next week.

manauref commented 1 year ago

Responses to @ammarhakim's points:

You could initialize it in Gkyl.h as we were doing before, but you'd have to pass the returned adios object to PlasmaOnCartGrid. I think I prefer this new way as it is more transparent, and although it is wrapping C, our manipulations are just at the Lua level, instead of having to mix C (Gkyl.h) with Lua (PlasmaOnCartGrid).
Yes, you are right. But for some reason when I query the size of a string (e.g. inputfile) it just returns 1. So I've been allocating a string with 100 characters for now.
Buffering is something that was implemented in g2 when using ADIOS1. Not sure why. It's certainly not needed for CartGridFields. For DynVectors, there might be an issue with strides or something, so "buffering" might also just be concatenating the data so it's aligned. Not totally sure, but when I tried to remove the buffer it didn't work and I think it was because of the pad. We could check again.
I'm not suggesting 1 big file, but one file for each set of diagnostics for all frames. So for example, an electron ion simulation would have at least 7 files: ion_gridDiagnostics, ion_intDiagnostics, elc_gridDiagnostics, elc_intDiagnostics, field_gridDiagnostics, field_intDiagnostics, restart file. If you later want to give someone a single snapshot, we could create a tool that extracts a snapshot from these files (easy). In fact, postgkyl can already do that: you tell it to load a file, select an index, and use save to write it to a bp or hdf5 file.
It may be true that we are not presently I/O bound. @JunoRavin quotes I/O taking as much as 15% in some sims, but that is not a game-changing gain if we optimize it out. I do think however these changes could make I/O cleaner and simpler, but maybe that's a matter of style and preference.

By the way, these ideas of consolidating I/O into fewer files, and whether they gain anything or not, could be tested in a unit test before changing the App.

ammarhakim commented 1 year ago

We do not need to pass anything to anybody. We should create the object in the main gkyl.cxx and then set a global Lua variable which holds this. Then this can be queried and used if it is set properly. If it is not set then the write should fail.

Allocating strings of 100 bytes is a very bad idea. We need to see why this is failing in ADIOS and fix. Obviously we can't take a risk of failing randomly and taking hours of wasted effort to figure out why things crashed.

We are not IO bound. 10%-15% is ok for large file IO. No one has yet shown that this will be significantly reduce by writing 1 monster file. Is there any data to back up that this is actually faster? I do not want assurances from ADIOS people that it is so, but proper numbers with our IO pattern to show the gains. A 5-10% gain, or even 50%, will not help us at all.

JunoRavin commented 1 year ago

I just want to chime in that I am not opposed to writing out a single file for sets of diagnostics. But I do not think it is feasible to combine distribution functions and other grid diagnostics into a single ADIOS2 output. We might consider experimenting with a file that has all the distribution functions and a file that has all the other grid diagnostics (so 9 files in Mana's example). Combining distribution function outputs and other grid diagnostics would lead to huge headaches in my opinion for data sharing/transfer, as it is often easiest to most quickly analyze data in the moments but if we start lugging around 1 TB files because the distribution functions and the density are packaged together, I think we will regret it.

In terms of I/O vs. compute bound, I think we should be more precise about the problem: for most mid-scale simulations I/O is 10-15 percent of the compute time. However, because of how we do I/O with ADIOS1, we literally cannot run simulations on 10k+ MPI processes. We have only ever done performance analysis at scale, not science, because our ADIOS1 implementation gets us kicked off cluster resources for crashing the file system.

Whatever the ADIOS folks recommend for 1000+ node simulations is what we should be doing, or at least supporting, if we want to use leadership computing clusters. I don't want to completely revamp my work flows for these folder-based outputs and I certainly don't like the idea of all the distribution functions all being in a single file, but if we have to do this to run on 1000+ nodes then we should support the option to run the way ADIOS recommends.

manauref commented 1 year ago

thanks for the extra insight @JunoRavin

Yes, i forgot about distfs. We won't group them with gridDiagnostics, they'll have their own separate files.

ammarhakim / gkyl

Implement adios2 #150