Closed pbosler closed 3 years ago
@pbosler we discussed this today in our AD telecon. You're welcome to join next Thursday (10am MST), if you want to chip in. Here's a brief summary of what @AaronDonahue and I discussed (it was past the end of the telecon, and everyone logged off, so we'll discuss this again next week):
air_pot_temperature
; did you mean 'air_potential_temperature'?". I found a 50 line string similarity algorithm that I already implemented in a branch in ekat (the Jaro and Jaro-Winkler similarity measures), and we can use it on the dictionary in case of error. Even if the dictionary is long, this would be performed in case we're erroring out, so it's fine.I think that was most of the discussion. Feel free to join next Th, or drop a comment here with what's on your mind.
That sounds great. I'm off next week, but I agree that it would be good to attend the AD meetings.
Interesting idea, Pete. I was envisioning that we would use our own short variable names internally and translate them to CF-convention names before outputting them. I think you're suggesting that we also use the CF names internally... so SHOC and P3 would use them as well. This would be nice except 1). Some of the names are ridiculously long (like air_temperature_at_effective_cloud_top_defined_by_infrared_radiation) and 2). we may want to couple externally-developed parameterizations into the model later and wouldn't want to rename all of their variables.
But perhaps we could use a hybrid approach where the parameterizations can use whatever names they want, but the AD - which glues things together - would commit to only using the CF name when one exists... even if that name was long. This would solve the problem of wanting CF output so needing to translate internal variables to CF names before writing them out.
I'm certainly not saying this is how I want to go (yet), but it is an interesting idea. Is this what you were envisioning, Pete? What do others think of it?
Yes, that's a fair summary of what I was thinking. Basically, as more parameterizations and experiments come into use, we'll need a convention for names. Instead of coming up with a new one, it makes sense to use one that already exists and has widespread support within the community.
I agree that the names are very long and would be ugly/cumbersome in-source. @bartgol already has alias/short-name examples as enums in the AD (e.g., in field_tag.hpp
) -- we could do the same for the common variables that dynamics and most physics processes will use.
I think limiting CF names to I/O routines is complicated: how would the I/O manager be able to figure out what CF name to associate to output fields? Unless we come up with a HUGE and exhaustive list of potential short names for each CF standard name, it would be impossible.
I think the only way out is to force SCREAM to use CF names. In my post above I summarized some ideas Aaron and I came up with in the end of the AD telecon. Your concern about insanely long names can be mitigated with aliases/short-names (with the caveat that only one short name is allowed for each std name, to avoid subtle bugs). And new parametrizations can circumvent the CF names by declaring the field as "internal" (or some other keyword), which would tell the AD that the field is allowed not to follow CF naming scheme.
I also want to point out that I am only talking about the string stored inside each Field structure, not the name you give to the C++ variable representing the field. Although, I think it would be nice to have the name of the C++ variable matching the short name string stored in the field, whenever possible, for better clarity.
This is a bit of a complicated subject- let's discuss it at this week's AD telecon. @AaronDonahue - can you put it on the agenda?
Closing since redundant with #700
It's possible that different AtmosphereProcess subclasses may use different names to refer to the same variable. Currently, the DAG will notice unmet dependencies if a requested field is not found. Using the CF standardized names could ensure that all AtmsosphereProcess subclasses use the same dictionary to avoid variable duplication.
Related to #527, in the sense that these standardized names have to be used for netCDF.
http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html