Python script to create initial condition file for scream testing

bartgol commented 3 years ago

Talking with @AaronDonahue (following on conversations happened during the v1 telecon), we decided that it would be best to have SCREAM set initial conditions always through an input/restart file, even for scream-standalone mode, even when running just a small test (like dynamics-only, or homme+p3). This would allow us to get rid of the (frankly) clumsy FieldInitializer struct, and all its quirks and oddities.

This plan, however, brings up a question: what if someone wants to do a slightly different test? Say change something in the initial condition, or add a variable, or change a parametrization, or change resolution, or something else along these lines? We would like to have a lightweight python script that makes things easy. What we envisioned is something that allows me to do this:

# generate init condition netcdf file at ne4 with 128 levs and phys grid corresponding to gll nodes, with all vars inited to 0
./gen-input-file -o filename --ne 4 --pg=gll --nlev 128 --vars pressure temperature velocity water_vapor
# modify $filename by setting pressure=1, temperature to the result of given expression, and importing velocity from other_file
./mod-input-file -f filename --set-val pressure=1.0 --compute temperature="lat^2 - lon^2" --import velocity=other_file[:velocity_name_in_other_file]

This script is obviously meant to be used mostly for small test cases, supporting limited features (the fanciest being the --compute flag, which could be extended to use other fields currently stored, like --compute exner="(p/p0)^1.3"). Since its goal is for small testing purposes, the "quality" of generated data is not "that" important. The goal is simply to give devs a fast and simple way to generate an input file that will allow to bootstrap their testcase, avoiding invalid/incompatible data.

@PeterCaldwell @rljacob I suspect there might already be some tool for this in the e3sm ecosystem. If so, then we might simply hijack it for our purposes, perhaps wrapping it in a smaller script for the sake of simple scream-related usage.

PeterCaldwell commented 3 years ago

I think I like this idea but I'm not sure I understand it. A couple thoughts:

the variables needed to run P3 standalone (for example) are different than the variables which would be needed to initialize the whole model (because many of P3's variables are calculated by processes called after the model starts but before p3 is called). Thus having a single utility for providing inputs to run any kind of simulation could end up getting really complicated/weird.
this is a bit of a tangential thought, but it would be really good if all initial conditions and boundary conditions were online-interpolated to the target resolution. This would get rid of a huge quagmire in E3SM right now where you can't run at a new resolution because you never have the right files.
Getting rid of FieldInitializer would probably be good eventually. I like the idea of creating whatever field variables we want in python, stuffing them in a netcdf, and reading them in. I don't think a tool for this currently exists. My experience with trying to manufacture my own initial conditions for E3SM is that E3SM is really rigid in the format of its input files in ways I've never been able to understand enough to get working... it would be nice if our input reader was more laid back.

bartgol commented 3 years ago

I'll reply in different order.

(2) This might be very demanding, and probably completely out of the scope of scream. You are talking about generating input files for, say, ne=30 from input files for, say, ne=256 (or viceversa). That's a very cool feature, but I suspect it might take more than a few cycles to implement.

(3) I know nothing about E3SM inputs readers. Whether our reader is less/more/same rigid is something that maybe @AaronDonahue can help figuring out.

(1) The utility should allow to create very simple input files. And yes, different tests need different inputs. Now I might say something wrong (@AaronDonahue correct me), but I think our scorpio input reader can read a subset of the fields that are in the nc file, which could help if field "blah" from input file becomes no longer needed because some other atm proc is now present and can provide it. In particular, at runtime the AD will figure out what are the "atm inputs", and ask our initial condition reader to read them from a user specified filename. If some field is not present in that file, crap out. If there are more fields than we actually need, that's fine, we'll just get what we need.

The way I see this tool being used is like this. Say I want to create a homme+p3 test, with ne4 and 128 vlevs. Great. 1) I figure out (from a dry run of the ad, which will print out a broken dag) which are my required inputs 2) Then I run (I'm making up the syntax as I write, it can be adjusted):

gen-input-file -o test_input.nc --ne 4 --nlev 128 --scalars2d="ps" --scalars3d="dp T qv [...]" --vectors3d="v:2 [...]"

This will create an nc file with all fields set to 0. The v:2 is a way to specify that v is a vector with 2 components (first syntax I came up with). 3) I suspect all fields set to 0 won't work, so I set some to some initial condition:

mod-input-file -f test_input.nc --compute-var T="273 - lat/30" --set-var "qv=1.0"

Someone already wrote an input file with some p3 vars in, so use it:

mod-input-file -f test_input.nc --import-var qr=other_nc_file.nc:qr qc=other_nc_file.nc:qc ...

And I can compute some var from others already inited (again, silly examples here, but just to give an example).

mod-input-file -f test_input.nc --compute-var exner="(p/p0)^1.3"

As I said, I don't foresee this tool to be very complicated. But it might be good enough to generate "valid" data that can be used for unit testing. Or to add on the fly a variable that is not in an input file, for which a constant initial value or a simple math expression would be fine.

Also, these simple tools can then be wrapped in a more sophisticate script that can actually compute "realistic" initial condition. If you have, say, an analytic expression for T, and you can compute p from T and then v from p,T (I'm making up stuff, but bear with me), then simply write a script that does a bunch of calls to gen-input-file and mod-input-file, passing the correct expressions.

jeff-cohere commented 3 years ago

It seems like this approach is informed by the various XML tools out there that are designed to selectively edit giant XML files. I was going to ask about whether anyone felt that the way that we use YAML could be adapted to create NetCDF files based on atmospheric conditions with specific profiles and/or thermodynamic/statistical properties. I suppose this type of thing might be used in the more sophisticated script that Luca mentions above.

This more sophisticated tool would definitely be more work than what this issue describes, though. And we'd have to make sure that no one else was already working toward a similar thing, unless we're happy having it be useful only to SCREAM. In any case, it does seem to me that we need to make it much easier to generate valid initial conditions for these simulations.

bartgol commented 3 years ago

@jeff-cohere We should definitely check with other e3sm folks if tools like this exist already.

I want to reinforce that I was only thinking about a quick tool to generate semi-dummy nc files for our development testing, with relatively small grids. In this scenario, I don't foresee very complicated uses, with probably mostly "fixed-value" fields, and maybe "import-from-another-nc-file" fields. The possibility to parse a math string came to mind too, however, since we have plenty of math string evaluators lib (I know boost and mu_parser in C++, but I'm quite sure python has a pletora too).

That said, if a tool like this does not currently exist, one may think it could be useful also in general, then we (plus some other infrastructure folks, maybe) could put together a small stack of scripts, to allow more robust and "realistic" uses. E.g., one could have the scripts I mentioned above as a "work-horse", and build other scripts on top of that, that do some name-checking and more complex setup, like establishing complicated math expressions, based on test case.

Talking out of my donkey, but as pseudo-examples, the outer wrappers could do stuff like: 1) if baroclinic case, set T='my complicated math string' and v='my_fcn(T)'. 2) take existing input file and modify T so that p,T, and whatnot satisfy some thermodynamic property. 3) modify existing fields, adding some sort of noise (everywhere, or at specific locations).

These last 3 are way beyond what we need in scream, but they sound quite cool and useful to me, so if there is no such tool already, I would maybe propose it to the infrastructure team...

bartgol commented 3 years ago

Done in #896.

E3SM-Project / scream

Python script to create initial condition file for scream testing #892