NCAR / ccpp-physics

GFS physics for CCPP
Other
56 stars 145 forks source link

Allow multiple instances of CCPP physics in single executable for ensemble DA #999

Closed michalakes closed 4 months ago

michalakes commented 1 year ago

Description

In order to have an ensemble of model instances in a JEDI executable, we need for packages the model relies on to be “stateless” – meaning that the data for an instances are not heap-allocated but rather passed in as arguments or part of an allocated instance of a type.

In the version of CCPP we’re using (I think it’s v6), there’s a module named GFS_container that defines GFS_control, GFS_data and GFS_interstitial as module data:

    !-------------------------------------------------------!
    !  GFS data containers, GFS_Data has dimension nblocks  !
    !  and GFS_Interstitial has dimension nthreads          !
    !-------------------------------------------------------!
    type(GFS_control_type),                                    save, target :: GFS_control
    type(GFS_data_type),          dimension(:),   allocatable, save, target :: GFS_data
    type(GFS_interstitial_type),  dimension(:),   allocatable, save, target :: GFS_interstitial

The dimensions of GFS_data and GFS_interstitial are nthrdsX, which one assumes are for threading and not intended to support multiple instances of these data structures. Are the CCPP developers considering any different ways of defining these so that multiple separate instances of CCPP physics can be supported within an ensemble executable?

Solution

One idea would be to add an "instance" dimension to these arrays, if CCPP could include a way of indexing which instance was intended. Or the arrays could be part of an array of ccpp_t instances.

michalakes commented 1 year ago

I have looked at how CCPP is implemented in the UFS medium-range weather app, ufs-mrweather-app, and as far as I can tell UFS is doing the same thing as our driver in NEPTUNE: namely, defining the GFS_control/_data/_interstitial structures statically as allocatable arrays in module data. The UFS definitions even have “save” attributes. I can’t see how this could work for multiple instances of CCPP physics working as part of an ensemble.

grantfirl commented 1 year ago

I'll post my email response to John here for posterity:

If I'm understanding correctly (a bold assumption!), I don't think that there is anything in the CCPP that requires hosts to have a specific memory management structure. The organization of variables into arrays of DDTs with dimensions of nthreads and/or nblocks is entirely up to the hosts and as long as the cap generator knows how to reference the data from the host that it needs to pass into various schemes, everything works OK. In the NOAA-based models, this is facilitated through specifying how to reference variables in the TYPEDEFS_NEW_METADATA section of the ccpp_prebuild_config.py file. It's possible that what is implemented in the cap generation is not sufficient for an ensemble of Neptune instances, but it should be possible to augment the cap generation such that the right data is passed if so.

I'm also aware that the UFS has been run in a nested configuration where the outer domain used one CCPP suite and the inner domain another. At least that is an example of two non-interacting physics suites being run within one executable that may prove that what you need is doable?

@dustinswales @DomHeinzeller @climbfuji Do you guys have a better answer for @michalakes ?

michalakes commented 1 year ago

I have added draft pull requests to main of this repository (ccpp-physics) and to the ccpp-framework repository. Since our meeting yesterday, I got the NEPTUNE code to work as two instances using suite (7) of the physics (so I've only touched that subset in the pull requests).

I uncovered a couple issues that arose since we talked on Monday. The item of greatest concern is the number of physics packages in the ccpp-physics repository that are not fully compliant with the rules specified here:

image

An example is the use if an "is_initialized" module variable in Thompson MP and other physics packages, which causes problems for having multiple instances of CCPP physics in the same process memory. Only the first instance gets initialized.

Two other examples that I made an attempt at fixing are in the files h2ointerp.f90 and ozinterp.f90. Module arrays defined in ozne_def are being allocated and populated. It's a problem if instances need different ozone or h2o data, but even if they use the same data, the problem shows up immediately when the second instance of the physics tries to allocate these arrays and gets an "already allocated" error. My "solution" for now was to add a test for whether they've already been allocated along with a comment warning about the potential danger:

image

Another example that I noticed (but haven't addressed yet) are module variables in GFS_rrtmg_setup.F90:

image

Finally, and more generally, I think there is a better way to address the issue of state in CCPP physics than creating and maintaining instance-indices into various arrays of physics state; but the alternative is a bigger lift: physics packages should be defined as types with type-bound procedures implementing their methods. That way, if a physics package needs to carry around state data, that state would always be local and unique to the instance of the type.

Thanks for all your help with this.

dustinswales commented 1 year ago

@michalakes Thanks for putting these PRs together. And the detailed information here.

There are several schemes that allocate data on the heap during initialization phase, h2ointerp, ozinterp are two, and as you suggested, the radiation schemes do this quite a bit as well. I'm sure there are others. This data isn't decomposition dependent since it doesn't depend on the horizontal dimension, so it's not breaking any ccpp rules. But as you have highlighted, it's problematic when using multiple ccpp instances with heap allocated data.

I believe you took two different approaches in your modifications: One where you guard against trying to allocate data that has already been allocated (e.g. h2ointerp and ozinterp). In these cases it seems that data allocated on the heap persists across all instances, hence the reallocate error. The other approach was to declare heap allocated fields by ccpp instance (e.g thompson).

I'm wondering if we can define a "main_instance" and only call the _init routines when (instance == main_instance)? If this works then we wouldn't need to add conditional checks on all the heap allocations happening during the _init() phase, nor the need for extra dimensionality to heap allocated variables.

michalakes commented 1 year ago

Thanks for reviewing my comment and for your thoughtful reply.

I did wonder about the CCPP rule concerning decomposition-dependent data and whether it applied in these cases. I agree with you that a strict reading says it doesn't apply. But in terms of the rule's intent, and considering there's also a rule against COMMON data (which also puts state data on the heap), I figured it wasn't too much of a stretch. The CCPP project might consider adding a more general requirement for statelessness going forward.

I tried your idea of checking the instance against main_instance and that works for the physics/mp_thompson.F90 module. I don't have any problem with doing it that way. Let me know if you'd like me to make this change. The only possible issue would be that it introduces an order of initialization requirement that the host program will need to know about when spawning off instances of itself. That might also present an issue if the initializations of different instances are run on different threads in a parallel region. It would need to be clear that these initializations aren't thread safe and shouldn't be run concurrently on multiple threads.

WRT the approach I took with h2ointerp and ozinterp, that was a little laziness on my part. I didn't want to thread the ccpp_instance variable all the way down to those and add another dimension to the oz_lat, ozpres, and oz time module arrays. My guess was that these tables aren't likely to differ between physics instances (but that's the kind of assumption that gets one in trouble later).

dustinswales commented 1 year ago

@michalakes

I did wonder about the CCPP rule concerning decomposition-dependent data and whether it applied in these cases. I agree with you that a strict reading says it doesn't apply. But in terms of the rule's intent, and considering there's also a rule against COMMON data (which also puts state data on the heap), I figured it wasn't too much of a stretch. The CCPP project might consider adding a more general requirement for statelessness going forward.

I agree with your OO approach to how a scheme should be organized, where each scheme would be its own class with associated data and type-bound procedures. But in reality, most schemes aren't close to this level of readiness, and it's heavy lift for distributed development in a community code-base. I don't think the rule against COMMON data is in regards to it being allocated on the heap, but rather it's good "modern" practice (modern is a stretch here, but it shows how far we are from stateless schemes)

I tried your idea of checking the instance against main_instance and that works for the physics/mp_thompson.F90 module. I don't have any problem with doing it that way. Let me know if you'd like me to make this change. The only possible issue would be that it introduces an order of initialization requirement that the host program will need to know about when spawning off instances of itself. That might also present an issue if the initializations of different instances are run on different threads in a parallel region. It would need to be clear that these initializations aren't thread safe and shouldn't be run concurrently on multiple threads.

I was more so just thinking out loud of a way to do this w/o having to make too many code changes. If we can condition the initialization routines to only run with a single instance, and this works for your needs, then maybe this logic could be part of the framework? and not add the logic the heap allocations during the _init() phase? @DomHeinzeller Is this feasible?

WRT the approach I took with h2ointerp and ozinterp, that was a little laziness on my part. I didn't want to thread the ccpp_instance variable all the way down to those and add another dimension to the oz_lat, ozpres, and oz time module arrays. My guess was that these tables aren't likely to differ between physics instances (but that's the kind of assumption that gets one in trouble later).

Understood. Hopefully we can find a general solution that can be applied upstream in the _init phase, or framework.

climbfuji commented 1 year ago

@michalakes

I did wonder about the CCPP rule concerning decomposition-dependent data and whether it applied in these cases. I agree with you that a strict reading says it doesn't apply. But in terms of the rule's intent, and considering there's also a rule against COMMON data (which also puts state data on the heap), I figured it wasn't too much of a stretch. The CCPP project might consider adding a more general requirement for statelessness going forward.

I agree with your OO approach to how a scheme should be organized, where each scheme would be its own class with associated data and type-bound procedures. But in reality, most schemes aren't close to this level of readiness, and it's heavy lift for distributed development in a community code-base. I don't think the rule against COMMON data is in regards to it being allocated on the heap, but rather it's good "modern" practice (modern is a stretch here, but it shows how far we are from stateless schemes)

I tried your idea of checking the instance against main_instance and that works for the physics/mp_thompson.F90 module. I don't have any problem with doing it that way. Let me know if you'd like me to make this change. The only possible issue would be that it introduces an order of initialization requirement that the host program will need to know about when spawning off instances of itself. That might also present an issue if the initializations of different instances are run on different threads in a parallel region. It would need to be clear that these initializations aren't thread safe and shouldn't be run concurrently on multiple threads.

I was more so just thinking out loud of a way to do this w/o having to make too many code changes. If we can condition the initialization routines to only run with a single instance, and this works for your needs, then maybe this logic could be part of the framework? and not add the logic the heap allocations during the _init() phase? @DomHeinzeller Is this feasible?

Good question. I am following this conversation on the sideline, and it reminds me a lot of what we had to do in terms of thread safety for the four init/finalize phases. Maybe similar restrictions can be put in place?

WRT the approach I took with h2ointerp and ozinterp, that was a little laziness on my part. I didn't want to thread the ccpp_instance variable all the way down to those and add another dimension to the oz_lat, ozpres, and oz time module arrays. My guess was that these tables aren't likely to differ between physics instances (but that's the kind of assumption that gets one in trouble later).

Understood. Hopefully we can find a general solution that can be applied upstream in the _init phase, or framework.

michalakes commented 1 year ago

I was more so just thinking out loud of a way to do this w/o having to make too many code changes. If we can condition the initialization routines to only run with a single instance, and this works for your needs, then maybe this logic could be part of the framework? and not add the logic the heap allocations during the _init() phase? @DomHeinzeller Is this feasible?

I would need to dig a little more to verify this, but I suspect it will not be as simple as having one instance do the initialization for all of them. It's likely that each instance of the host model needs to initialize its own instance of the physics in order for that instance's copies of the GFS_Control, GFS_Data and GFS_Interstitial structures to be populated.

michalakes commented 1 year ago

I agree with your OO approach to how a scheme should be organized, where each scheme would be its own class with associated data and type-bound procedures. But in reality, most schemes aren't close to this level of readiness, and it's heavy lift for distributed development in a community code-base. I don't think the rule against COMMON data is in regards to it being allocated on the heap, but rather it's good "modern" practice (modern is a stretch here, but it shows how far we are from stateless schemes)

Understood. Forgive me as I belabor the point a little and step (respectfully) out of my lane: I think the role and purpose of CCPP is a question that should be revisited from time-to-time within CCPP governance. Clearly, part of the purpose of CCPP is to corral disparate community-developed packages and present them for use by a range of host models through a coherent, largely automated framework and API. And CCPP is meeting that objective.

But this also presents the danger of bringing a host model down to the most weakly engineered package in CCPP -- case in point, the issue we're discussing here involving support for multiple instances. There are also issues involving thread-safety, next-generation architectures (GPUs), contributions of non-Fortran packages (e.g. DOE's SCREAM physics), etc. I think it's well in keeping with the overarching goal for the U.S. to have the best modeling system in the world that CCPP go beyond communitizing a rag-tag of existing software but also establish and enforce the best software engineering practices for physics going forward.

Specifically with regard to the OO idea, I think it's probably feasible to wrap even the dustiest of dusty deck physics into a Fortran class.

Okay, back in my lane again. but if you want me and NRL to help push for greater support for CCPP please let me know.

gold2718 commented 1 year ago

I think the role and purpose of CCPP is a question that should be revisited from time-to-time within CCPP governance.

I think this issue should be addressed by CCPP governance (as you said) because I do not think the "rules" (as currently framed in https://ccpp-techdoc.readthedocs.io/en/v6.0.0/) provide any guidance on who needs to "fix" their code. I am going to move the rest of this response to the PR I believe is associated with this issue (NCAR/ccpp-framework#463) because I think that is where changes need to be made.

grantfirl commented 4 months ago

Fixed via https://github.com/NCAR/ccpp-framework/pull/463