GEOS-ESM / MAPL

MAPL is a foundation layer of the GEOS architecture, whose original purpose is to supplement the Earth System Modeling Framework (ESMF)
https://geos-esm.github.io/MAPL/
Apache License 2.0
27 stars 17 forks source link

Enable the MAPL grid factories to control PET placement when creating grids not on the full VM. #953

Closed bena-nasa closed 3 years ago

bena-nasa commented 3 years ago

As part of the project to make an ESMF or NUOPC based History component, if we go the ESMF route it sounds like this will involve the ESMF_VMEpoch. If we do plan on using the ESMF_VMEpoch that must be called on the same VM across all PET's making the call, and therefore anything called in the epoch. So this would necessitate creating different fields on a subset of PETs in the VM for the application/client and the server. So for initial performance testing and prototyping this would be needed.

The ESMF_GridCreate that we would use for Lat-Lon and Tripolar can take a PetList argument so control the DE placement. Unfortunately the cubed-sphere creation API does not. In that case you have to pass a DELayout (which can be created from the PetList).

I'm proposing that we give the ability to pass a PetList to the factories. Some thought on how to do this is needed but could be a good first project for Gian to provide a useful feature as well as understand the code.

tclune commented 3 years ago
  1. Would a useful first step be to create a wrapper for the CSGridCreate method that takes a PET list, creates a DELayout and then calls the existing CS GridCreate? I.e., sidestep the factories for the moment?
  2. Are you certain that we want/need to involve the factories? I had hoped/thought that maybe ESMF Reconcile would allow us to avoid making some of these types of changes.
rsdunlapiv commented 3 years ago

I think we should talk about this. It is very unusual in ESMF to need to specify PET lists explicitly, except for when you are setting up a component context itself. I don't think I fully understand the issue, so maybe we can take it up on a call when looking at code? It might be helpful to think about the superstructure part for a bit (i.e., what are the component boundaries) and get that set before creating the distributed data structures.

bena-nasa commented 3 years ago
  1. We could wrap the ESMF CS grid creation method. It would be nice to be able to test this with a lat-lon grid too. Creating the lat-lon grid and adding coordinates is a little bit of work, it seems like since we already have the factory that does all this we should use it.

What about this, the factories all end up calling a routine to create from parameters from the create_from_metdata, config, etc ... We could add pet list to the underlying create from parameters function so if the user wants to use it they have to invoke the constructor for the grid factory. Don't worry about making it available via config for example. This would be less intrusive and an easy way to create these for testing.

  1. I don't think reconcile will does what you think. Someone has to create a field on the appropriate PETs so for the server this means creating a field that has DE's on the server pets. Reconcile won't do this, that is not what is meant for. That is just to handle the case where some PETs never made say the FieldCreate call, so at least all PETs are aware of the field, not that there is data on those PETs. So for now someone will have to create a grid(s) at the level of the vm of the server+application on a subset. I just don't see how this is avoidable. So we don't have to use the factories, but we do have to create grids and we have the code in the factories neatly packaged already.
bena-nasa commented 3 years ago

I think we should talk about this. It is very unusual in ESMF to need to specify PET lists explicitly, except for when you are setting up a component context itself. I don't think I fully understand the issue, so maybe we can take it up on a call when looking at code? It might be helpful to think about the superstructure part for a bit (i.e., what are the component boundaries) and get that set before creating the distributed data structures.

In order to use the VMEpoch framework, and stuff a FieldRegrid inside the epoch, I thought it said that you can't have a sender and receiver on the same PET, ergo the source and destination field can't be on the same PET's, which means you have to have grids that don't have DE's on the same pets. So either passing a PetList to a GridCreate call or passing a petList to a component.

bena-nasa commented 3 years ago

Closing this for now as it was decided that we would not put this in the factories for now, but rather make some simple module to make a few grids select we can use for testing. If needed we can revisit this.

bena-nasa commented 3 years ago

Closing, see previous comment