ESMCI / cime

Common Infrastructure for Modeling the Earth
http://esmci.github.io/cime
Other
157 stars 204 forks source link

Support the creation of multiple user_nl files for a component #3999

Closed billsacks closed 2 years ago

billsacks commented 3 years ago

CISM in CESM will soon support running multiple ice sheets in the same case. Each of these ice sheets will have its own config file (similar to a namelist file), with different settings. Therefore, in a run containing Greenland and Antarctica (for example) we need a way for users to be able to separately specify settings in the user_nl_cism file that apply (1) to both ice sheets; (2) just Greenland; (3) just Antarctica.

Based on discussion here https://github.com/ESCOMP/CISM-wrapper/issues/50 the user_nl_cism file will look like this:

! Settings that apply to all ice sheets go here
myvar = myvalue

[gris]
! Settings that apply to Greenland (if present) go here
foo = value1
bar = value2

[ais]
! Settings that apply to Antarctica (if present) go here
foo = value3
baz = value4

I propose adding a function to CIME (in user_mod_support.py) to parse a file like this, outputting a new file that just contains the requested subset of settings. In CISM's buildnml, we would then call this function three times:

First, we'd request a file that just contains settings that apply to all ice sheets (this is needed to create the top-level cism_in file): separate_user_nl(..., section=None), which would produce:

! Settings that apply to all ice sheets go here
myvar = myvalue

Then we would call separate_user_nl(..., section='gris'), which would produce:

! Settings that apply to all ice sheets go here
myvar = myvalue

[gris]
! Settings that apply to Greenland (if present) go here
foo = value1
bar = value2

Finally, we would call separate_user_nl(..., section='ais'), which would produce:

! Settings that apply to all ice sheets go here
myvar = myvalue

[ais]
! Settings that apply to Antarctica (if present) go here
foo = value3
baz = value4

(Note: It's intentional that the settings that apply to all ice sheets appear in all cases. This section can contain a mix of settings that (a) go in cism_in, which contains non-icesheet-specific settings, and (b) go in all cism.config files – i.e., ice sheet-specific settings, but for which the user wants the same setting to apply to all ice sheets.)

I hope to implement this in the next week or so. I am opening this to gather any feedback on:

  1. Are there any objections to putting this functionality in CIME? I could put it in CISM, but I prefer putting it in CIME for two reasons: (i) I would like to cover this functionality with unit tests; putting it in CISM would require me to build up a new python unit testing infrastructure in CISM. (ii) Maybe this could be useful for other components for some related or different purpose.

  2. Are there any objections to the format of the user_nl file proposed above? We are not completely tied to this format, though we (@whlipscomb, @mvertens and I) prefer it over other options we considered (see https://github.com/ESCOMP/CISM-wrapper/issues/50).

jedwards4b commented 3 years ago

@billsacks What about a separate file for each grid? So you would have user_nl_cism, user_nl_cism.gris and user_nl_cism.ais ? I believe that this implementation could be done with fewer less disruptive changes in cime.

gold2718 commented 3 years ago

I agree with @jedwards4b, this feels more like you are running two instances of an ice sheet model so using multiple namelist files feels most CIME-like.

mvertens commented 3 years ago

@jedwards4b @gold2718 - As @whlipscomb pointed out, we will eventually want to support multiple paleo ice sheets: Laurentide, Eurasian, Fennoscandian, and so on. In addition to Greenland (gris) and Antarctica (ais), we'd have to carry around other usernl* files for ice sheets that are irrelevant to most people running the model. I believe that this would be awkward for the user and support. What if a user wanted to use data from a previous run - but use less ice sheets. Rather than just editing one user_nl_cism file they would have to separately edit and create only a required subset of files.

billsacks commented 3 years ago

What about a separate file for each grid? So you would have user_nl_cism, user_nl_cism.gris and user_nl_cism.ais ?

The problem is related to the CIME code for prestaging the user_nl files to the case directory. As alluded to by @whlipscomb and @mvertens, I think it comes down to how much we care about having (potentially many) unnecessary user_nl files staged in the case directory. It would be easy to extend the CIME code that prestages files so that, instead of just staging user_nl_COMP, it stages any file named user_nl_COMP or user_nl_COMP.* (for example). What would be harder, and possibly substantially harder, would be having CIME prestage only those files that are actually needed for the given case. I can't see a way to do this without tying CIME to CISM's XML variables, and it's possible that it would require changing CIME whenever someone wants to introduce a new ice sheet, which is something we'd like to avoid.

To be clear, even with the proposed approach of having a single user_nl_cism with different sections, there may still be unused sections in your user_nl_cism file. For example, in a Greenland-only run, you would ignore the [ais] section. But I think the feeling is that it's more confusing to have completely ignored files (like user_nl_cism.ais) rather than just ignored sections in the user_nl_cism file.

@jedwards4b I'm also wondering if you can explain why you view my proposal as disruptive. It would mean adding one function that could be invoked by components, but wouldn't impact any existing code. That said, I'm fine with keeping this in CISM if you object to its inclusion in CIME.

jedwards4b commented 3 years ago

@mvertens I don't think that we need to "carry around" any usernl files that are not used for the current case. I think that most of this functionality can be handled in cism's buildnml file, using multiple files will require few if any changes in cime. It is known when buildnml is invoked what ice sheets are active for the case. I think that the distinction you are trying to make between support for multiple files and support for multiple sections in a single file is weak at best.

billsacks commented 3 years ago

@jedwards4b I have a feeling you were writing your reply before mine came through. The tie-in with CIME is the need to initially stage the user_nl files to the case directory, which is something handled by CIME, not by CISM's buildnml.

ghost commented 3 years ago

I'm sure I've replied to one of these before, but please can you remove my email from here. I've no idea why my email address got involved with you guys, but I'm sure you don't want a random external email address involved in your communications

Cheers,

On Tue, 15 Jun 2021 at 16:43, Bill Sacks @.***> wrote:

@jedwards4b https://github.com/jedwards4b I have a feeling you were writing your reply before mine came through. The tie-in with CIME is the need to initially stage the user_nl files to the case directory, which is something handled by CIME, not by CISM's buildnml.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESMCI/cime/issues/3999#issuecomment-861610520, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD77NDVDU37DOQK7QT6CELTS5YJRANCNFSM46XJ2IRQ .

jedwards4b commented 3 years ago

Perhaps this points out a flaw in cime - the component buildnml knows what usernl* to expect in a given case, the responsibility to put them there, if not already present, should be added here. I think the addition of new files with the already defined and tested file format is going to be less disruptive than adding a new format modification which may have multiple unintended consequences.

jedwards4b commented 3 years ago

jedwards - very sorry for your trouble - I don't see how we can remove you from the thread once referenced - you will have to remove yourself using the unsubscribe button on the lower right of the issue page.

billsacks commented 3 years ago

jedwards sorry that was my fault for not adding the "4b". I apologize.

billsacks commented 3 years ago

@jedwards4b - after talking with @mvertens we would really like to move ahead with the originally-proposed solution of having a single user_nl_cism file with multiple sections. We feel this will be most clear / least confusing to users.

So the question for you at this point is just whether you object to having a function that parses this file in the CIME lib. I want to reiterate that this function would not be called in the normal / existing flow of code. It would just be available for component buildnmls to call this function if they want. So it has no potential for unintended consequences for other components. That said, if you object to putting it in CIME, I can put it in CISM instead.

gold2718 commented 3 years ago

@jedwards4b, I am trying to remove jedwards by editing the comments that had a typo in a reference.

gold2718 commented 3 years ago

I think I withdraw my opposition to this change as long as sections are not required. We are moving in a different direction in CAM but I am interested in learning more about your approach.

... outputting a new file that just contains the requested subset of settings ...

Where is this file output? Why is a file needed instead of some sort of return object?

What is output in the run directory (i.e., one file, multiple files)?

jedwards4b commented 3 years ago

@billsacks I think that (1) would be easier and require few if any changes in cime, but I don't object if you want to proceed with adding separate_user_nl(...) - I'm curious - what is produced in the run directory, that is what does cism actually read for input?

billsacks commented 3 years ago

Perhaps this points out a flaw in cime - the component buildnml knows what usernl* to expect in a given case, the responsibility to put them there, if not already present, should be added here.

I see your rationale, but it seems like this would require a bigger change – introducing a whole new step at which a component-specific script is called, in the create_newcase process.

as long as sections are not required

Right, this would be completely optional and not impact other components in any way.

We are moving in a different direction in CAM

I would be interested to hear what that direction is – at least if you are solving a similar problem. One (important) rationale I see for putting this new code in CIME is to try to facilitate code sharing between components if other components have substantially similar needs. Also, if two components are solving basically the same problem, it would be nice for users if the user interface mechanism is the same, rather than having arbitrary differences between components. We could adjust our approach in CISM to remain consistent with other components if there is already a very similar thing being done by others.

Where is this file output? Why is a file needed instead of some sort of return object?

CISM's buildnml code (as for other components) currently calls CIME's function, create_namelist_infile:

https://github.com/ESMCI/cime/blob/71ddf1ced43ca77e422abe2b460adfb29710b91a/scripts/lib/CIME/buildnml.py#L98-L124

(Side note: this argues for putting a new function in this buildnml.py, not in user_mod_support.py.)

This function expects a pre-existing file (typically $CASEDIR/user_nl_COMP) and creates a new file after doing a bit of reformatting / processing (I think typically $CASEDIR/Buildconf/namelist). Then the path to this temporary namelist file is passed to the namelist generation code.

What I'm proposing is keeping effectively that same flow, but now, instead of using the whole contents of user_nl_COMP, there will be an initial subsetting step before calling create_namelist_infile. I could do this all at once via an extra option to the existing create_namelist_infile function, but in the interest of modularity, testability and avoiding potential disruption, I feel it's best to do this in a separate preprocessing step. So instead of the flow, user_nl_COMP -> [create_namelist_infile] -> Buildconf/namelist, CISM (and any other component that chooses to use this new function) will have the flow user_nl_COMP -> [separate_user_nl] -> Buildconf/user_nl_COMP_subset -> [create_namelist_infile] -> Buildconf/namelist (where names in brackets are function calls and others are file names).

What is output in the run directory (i.e., one file, multiple files)?

I'm curious - what is produced in the run directory, that is what does cism actually read for input?

If you are running with N ice sheets, the final result is N+1 files: a cism_in file that has general settings that apply to all ice sheets (read by the "cism-wrapper" layer), and N cism.config files (read by CISM itself), one for each ice sheet. e.g., in a run with Greenland & Antarctica, you would have cism_in, cism.ais.config and cism.gris.config.

jedwards4b commented 3 years ago

If you are running with N ice sheets, the final result is N+1 files: a cism_in file that has general settings that apply to all ice sheets (read by the "cism-wrapper" layer), and N cism.config files (read by CISM itself), one for each ice sheet. e.g., in a run with Greenland & Antarctica, you would have cism_in, cism.ais.config and cism.gris.config.

This would seem to argue for separate files in the case directory as the easiest solution for the user. They would simply copy the file back from the run directory to user_{filename} in the case directory and edit as needed. I think that this is a common practice, especially if the changes are complex.

billsacks commented 3 years ago

I have been thinking about this more, especially inspired by this comment from @jedwards4b:

the component buildnml knows what usernl* to expect in a given case, the responsibility to put them there, if not already present, should be added here.

That idea is growing on me, if (1) @whlipscomb is happy with having multiple user_nl files, and (2) @jedwards4b (and anyone else who cares to weigh in) is comfortable with adding another linkage to the component's buildnml script. (@mvertens has given her okay if others are happy with this.) Here are some thoughts and questions. For questions, please see bold text below.

Specifically, I think it would be relatively easy to add functionality to CISM's buildnml to achieve staging an overall user_nl_cism plus one user_nl_cism_ICESHEET for each ice sheet actually present in this case.

(Q1) @whlipscomb - would you be happy with this solution (i.e., option (1) from https://github.com/ESCOMP/CISM-wrapper/issues/50) if we would only have user_nl files for the ice sheets actually present in this case?

@jedwards4b and other cime developers: The implementation I imagine for this is: In case_setup.py: _build_usernl_files: Add a call to a function that determines which user_nl files are needed for this component. This is done by looking for a function named get_user_nl_list in the component's buildnml module – I guess using the same re.search approach as is used in run_sub_or_cmd (though we wouldn't actually use that function, because we don't want to run this as a command):

https://github.com/ESMCI/cime/blob/71ddf1ced43ca77e422abe2b460adfb29710b91a/scripts/lib/CIME/utils.py#L408-L414

If present, then it calls that function to get a list of user_nl files needed for this component; if not present, then it will assume that we just need user_nl_COMP for the given COMP (i.e., the same as now).

(Q2) @jedwards4b and anyone else: does this implementation – and specifically the optional new connection from cime to components' buildnml – seem reasonable?

If people like this, then I have two additional questions, for @jedwards4b, @whlipscomb, @mvertens and anyone else:

(Q3) Tradeoff between ease of adding a new ice sheet and specificity of explanatory text at the top of each file: I can see two ways to implement this:

  1. Store separate template files for each possible ice sheet (e.g., user_nl_cism_gris, user_nl_cism_ais, etc. in the CISM-wrapper repository, then just copy each one into place.
  2. Store only a generic user_nl_cism in the CISM-wrapper repository, then copy that file to identical files named user_nl_cism, user_nl_cism_gris, user_nl_cism_ais, etc. in the case directory.

The advantage of (1) is that we could have comments specific to this ice sheet. For example, in user_nl_cism we could have: ! This file is for settings that apply only to the Greenland Ice Sheet; with (2), we would just have a generic comment like, ! The user_nl_cism file is for settings applying to all ice sheets, user_nl_cism.gris is for settings that apply only to the Greenland Ice Sheet, etc.. The disadvantage of (1) is that it adds one more step that someone would need to do when adding a new ice sheet. I'm leaning slightly towards (2), but don't feel strongly about it, and want to hear other people's thoughts on how important it is to have specific comments for each user_nl file.

(Q4) How to name the files: I would propose names like user_nl_cism_gris, using the short name that we are using in other contexts, including for the grid names and cism.config file names. Are people (especially @whlipscomb) happy with this? An alternative is to use the full icesheet name (like user_nl_cism_greenland). (Note that I prefer an underscore separator over a period because in user_nl_cism.gris, gris can be interpreted as the file type extension.)

Finally, I'll just point out that this:

They would simply copy the file back from the run directory to user_{filename} in the case directory and edit as needed. I think that this is a common practice, especially if the changes are complex.

doesn't work (I think) for CISM, even with a single ice sheet, because I think that the format of the cism.config file is incompatible with the format of the user_nl file. I also think that this is a bad practice that we should discourage, for a number of reasons (documenting what you've changed in your case, and the potential to accidentally override some setting whose default value depends on another namelist variable's value).

whlipscomb commented 3 years ago

@billsacks, thanks for continuing to think about the problem carefully.

For (Q1), I'm OK with having one user_nl_cism file for changes common to all ice sheets, along with additional files like user_nl_cism_gris for changes unique to a single instance. (Provided we don't have usernl* files for ice sheets that aren't part of the case, as you said.) I think this is no harder for users to understand than having multiple sections in a single user_nl_cism file, and might actually be less error-prone.

If I understand correctly, a user would have two ways to make a change common to all ice sheets: (1) Change user_nl_cism, or (2) Change both user_nl_cism_gris and user_nl_cism_ais. Is that right? The second way takes more typing but requires editing fewer files. I could imagine that some users might prefer (2).

For (Q3), I don't feel strongly one way or the other.

For (Q4), my preference is to use the short names: gris, ais, and so on. I'm fine with underscores.

Thanks again!

billsacks commented 3 years ago

Thanks @whlipscomb !

If I understand correctly, a user would have two ways to make a change common to all ice sheets: (1) Change user_nl_cism, or (2) Change both user_nl_cism_gris and user_nl_cism_ais. Is that right? The second way takes more typing but requires editing fewer files. I could imagine that some users might prefer (2).

That's right. However, we still need the overall user_nl_cism for changes to variables in cism_in (i.e., CISM-wrapper's namelist). In addition, it could help with backwards compatibility with old case setups, and I think this will help a bit with system testing, though wouldn't be critical for that purpose.

We could say that user_nl_cism is just for changes to cism_in, and you need to use the other files for any changes to the cism.config files. I'm not sure if you were starting to go there with your thinking. However, I don't think it will take much extra code to support having user_nl_cism changes apply to all cism.config files, so as long as you feel that supporting this gives more advantage than confusion, then I'll stick with that plan. Either way is fine with me, so just let me know your preference.

whlipscomb commented 3 years ago

@billsacks, I like the idea that changes to user_nl_cism apply to all cism.config files as well as cism_in, and I'm OK with having more than way to change settings for a particular ice sheet. I just wanted to make sure I understood the logic.

jedwards4b commented 3 years ago

@billsacks This looks good, and I think that the way you are implementing it will allow other components to take advantage of the feature if and when they are ready to do so. Thanks

billsacks commented 3 years ago

Thanks @jedwards4b - and thanks for your great suggestion of having the components' buildnml handle this responsibility. Once I saw how this could be done, this really did feel like the right approach to me.

billsacks commented 3 years ago

Darn, sorry again jedwards . Since I made the mistake once now your name is coming up in the auto-complete in this thread. I'll try to be more careful!