NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
44 stars 19 forks source link

Transition to DA build only #77

Closed edwardhartnett closed 3 years ago

edwardhartnett commented 3 years ago

Currently twe build a static and a dynamic allocation memory model.

Let's eliminate the static build and make the dynamic build the only build.

edwardhartnett commented 3 years ago

@mathomp4 you mention in #81 that you use the static build. Is there a reason why you prefer the static instead of the dynamic allocation build?

mathomp4 commented 3 years ago

@mathomp4 you mention in #81 that you use the static build. Is there a reason why you prefer the static instead of the dynamic allocation build?

I prefer it only because that is how it was built in our CVS GNUmake build that we are moving to Git. And my mandate was to make sure all the flags were exactly the same (so we could get the same answer).

My guess is the new BUFRLIB was brought in and "things" were done until it worked (this was after I moved from maintaining the CVS/GNU build to our current Git/CMake). Maybe they didn't know about the dynamic allocation build? I'll propose we move to it, but I don't know how to evaluate if doing so changes results.

Note we only saw this because (until now) our main GCM didn't build this library as it didn't need it. But recent work has discovered it will be needed, so it was added and our CI system (which is all GCC) threw a rod.

mathomp4 commented 3 years ago

@edwardhartnett I have a reply from one of our devs which I'll quote here:

I don't think that there was any difference in the results when compiling with and without -DDYNAMIC_ALLOCATION, in part because the code runs essentially the same if none of the default settings are changed. The dynamic allocation allows increasing the maximum sizes of some of the arrays used in the library, and allowed NCEP to retire their 'supersize' build.

I know that we had deliberately chosen to build without dynamic allocation. There may have been some issue with performance but the real issue was that when I tested running the GSI with the library built with dynamic allocation, back in May 2017, and the program crashed in the 'read_iasi.f90' routine. From what I could see, the BUFRLIB code sets up things to use the dynamic allocation the first time there is a call to OPENBF. However, in 'read_iasi.f90' there is a call to CLOSBF before the first call to OPENBF. The CLOSBF call does nothing because there is no BUFR file open but it runs afoul of the dynamic allocation stuff because it hasn't been set up by the first OPENBF call. I considered trying to put something at the beginning of the GSI to make sure the stuff for dynamic allocation was intialized before anything was done with any BUFR files - in case there were other cases in the code with a similar problem. However at the time I decided that we would just stay with our 'static allocation' the way that we had always done it and avoid the issue of having to make it work throughout the code... because the GSI didn't need it and I think there also was mention of a potential performance hit with the dynamic allocation. (Though of course because of the other issue we did not persue this.) If you want to pass this information back to NCEP you are welcome.

Now the read_iasi.f90 code she is referring to is here: https://github.com/GEOS-ESM/GEOSana_GridComp/blob/4362b003de0db5759d96b800bade3a70607d655a/GEOSaana_GridComp/GSI_GridComp/read_iasi.f90#L417

I took a look and it looks like in your latest release (11.3.2) and in develop there is new code there. And I see this in your release notes: https://github.com/NOAA-EMC/NCEPLIBS-bufr/blob/develop/docs/ReleaseNotes.md#version-1132---july-16-2020

I guess we will try and look at the new BUFR code soon and evaluate it.

edwardhartnett commented 3 years ago

Thanks @mathomp4 !

jbathegit commented 3 years ago

I was about to open a new issue to enable testing for all of the static and dynamic builds when I came across this issue which is still open.

@edwardhartnett can you tell me what the original rationale was for opening this ticket to eliminate support for static builds? I'm not saying this couldn't be done at some point, but it's not something we should do without a lot of prior coordination, especially for WCOSS users where it's always a constant battle to improve the timeliness and efficiency of program runs for NWP, and where using dynamic allocation obviously leads to at least a slight increase in run times.

jbathegit commented 3 years ago

Just to clarify, and for backreferencing purposes, I was about to open a new issue (to enable testing for all static and dynamic builds) as a result of the discussion in #94 , before I noticed that this issue was still open ;-)

edwardhartnett commented 3 years ago

If I understand correctly, by "static builds" you mean a special build which does not use malloc. This is different from the usual meaning of static builds, which means the .a build (as opposed to the .so shared library build). For this reason I will call this the "static memory" build, to differentiate from the "static library" build.

I would like to see proof of increase in run-time due to static memory build. I strongly suspect that this is an unproven and unnecessary optimization.

No other library in our stack does this. NetCDF, HDF5, eccodes, etc., all use malloc freely with no apparent loss of performance. Actually, I have never encountered any other package that tries a static memory build. Since the model run is limited by computation and I/O, it seems highly unlikely that any reasonable use of malloc would make a difference in run time. Mallocs and frees are not particularly expensive operations ( https://stackoverflow.com/questions/7612292/how-bad-it-is-to-keep-calling-malloc-and-free), in fact they are fast.

Many of the NCEPLIBS do things that no one else does. Nothing is free. If these things do not bring benefit, we must stop doing them and focus our efforts on work that does bring benefit.

Everything that we do differently from other teams needs to be carefully examined. It's extremely unlikely that we have discovered a good form of optimization that everyone else has missed. If we have, we need to prove it, and spread the knowledge. If we haven't, we need to do things in an industry-standard way, to control costs.

This does not affect whether you set up testing for both static and dynamic memory builds. We need to test both as long as we support both, and currently we are supporting both so need to test both. When the static memory build is discontinued, we will then stop testing it. Since, as you say, we have to work with users to transition the to the dynamic memory build, we cannot stop supporting the static memory build today.

Double the testing is one example of how this is not free.

On Tue, Jan 12, 2021 at 9:45 AM Jeff Ator notifications@github.com wrote:

I was about to open a new issue to enable testing for all of the static and dynamic builds when I came across this issue which is still open.

@edwardhartnett https://github.com/edwardhartnett can you tell me what the original rationale was for opening this ticket to eliminate support for static builds? I'm not saying this couldn't be done at some point, but it's not something we should do without a lot of prior coordination, especially for WCOSS users where it's always a constant battle to improve the timeliness and efficiency of program runs for NWP, and where using dynamic allocation obviously leads to at least a slight increase in run times.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues/77#issuecomment-758785155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJIOMME5EWHLB6N4D5D7PN3SZR4AVANCNFSM4TYNEEBA .

jbathegit commented 3 years ago

Yes, I'm talking about "static memory" builds and not "static library" builds. I realize those are two different things, as I noted in the discussion of #94

I wasn't aware that static allocation of memory was considered such a non-standard way of doing things, nor that studies have been done showing that dynamic allocation of memory isn't particularly expensive. If that's the case, then again I'm fine with a gradual move towards discontinuing support for the static memory builds, maybe not in time for the next 11.4.0 release but at least for the subsequent one thereafter. So until then we'll just keep this issue open, and I agree that until then we should also resume testing of static memory builds on all platforms.