Open chrisdane opened 3 years ago
Can someone PLEASE PLEASE put this in? It's been on my list for months.
How would this be best solved, Christopher?
echam:
scenario: PI-CTRL-SPINUP
Or rather:
echam:
namelist_variant: spinup # Or reduced, minimal, or something
I think using a new "scenario" may be confusing to some people
I dont know. Someone of the esm tools heads need to decide that.
Another point I cannot decide is how to distinguish different streams
-definitions in echam.yaml
and jsbach.yaml
in a clean and esm tools-way.
There is one problem to define scenario-dependent output streams of echam/jsbach. The echam.yaml
-entries
streams:
- accw
- co2
- echam
- g3bid
- g3bim
- g3bday
- g3b1hi
- glday
- aclcim
- sp6h
- spim
streamsnc:
- aclcim
- g3b1hi
- g3bday
- g3bid
- g3bim
- glday
- glim
- jsbid
- sp6h
- spim
define which output files will be moved from work
to outdata/<model>/
. This way, some files are ignored if I change the stream definitions in the namelist, or, the other way around, the esm tools complain that there are missing files if the namelist does not define some of these streams
.
In my view the solution to this would be to consider all work/<expname>_<current_date>.01_*
files and drop the streams
and streamsnc
lists completely (<current_date>
may be YYYYMM
or something similar, I dont know all cases). Would this be in accordance with how the esm tools work?
@mandresm @dbarbi could you please comment on that? Thanks a lot!
Distinguishing that cleanly isn't really so easy. The Namelist for echam also influences jsbach output, but then we have two different yaml files and also two different namelists even just controlling model behaviour. Confusion lurks around every single corner there.
Ultimately, we need to find some way of getting a single source of truth for the streams, and define that only once and have it propagate everywhere.
Chris, did you make a branch? Given that we have space issues, I'd like to put that in and update the handbook
Thanks @chrisdane for this suggestion, I think it's a very good point. About the streams, I would need to have a look again at the echam files. However, even if scenario
is an extra variable, I think it is a very intuitive one and could help us in the setup of the strings and on the selection of the namelist_dir
. @pgierz , why do you say scenario
might be a confusing variable for some people?
"Scenario" controls a lot more than just output, also for example which model features should be active. PI shouldn't have human land use change, a future run should. I would therefore recommend separating output control from the scenario.
Thanks Miguel. I agree that having another scenario
is not confusing.
@pgierz1: Having those streams
in the echam.yaml
and jsbach.yaml
is in my view the source of the confusion. Thats why I suggest to drop them completely. I made a suggestion of an alternative and my question was if this is possible.
@pgierz2: Yes I made a esm_tools branch and wrote that several times in my initial post: echam_scenario_PI-CTRL-SPINUP.
@pgierz3: Separating output control and scenario is a good idea. But for echam/jsbach, the output control is defined in the namelist.echam. This namelist, in turn, enters the esm tools via scenario
. You would have to change this workflow if you want to separate output control and scenario. It would certainly be possible but Im afraid this would mean much more work than just adding another scenario.
Chris is correct, that would need a bit of a stronger rewrite.
We could, for example, add one more layer of folders? Or maybe better, merge in the output part of the name list, separate from whatever defines physical behaviour? Do we have a merge feature for namelists yet? They are just dictionaries on the python side.
I would again strongly suggest against making the output dependent on the scenarios. That (gut feeling) seems too easy to mess up accidentally. We will fix one, but forget the other.
Chris, sorry I didn't see your branch in the initial post. Info is obviously there, I just need to learn how to read...
Paul, the output definitions for echam/jsbach are not as simple as for fesom. Separating the stream definitions from everything else in the namelist.echam would yield more confusion in my view. For example, &runctl:default_output
, i.e. a namelist parameter from another chapter within the namelist, affects the output as well.
I dont see a problem in making output dependent on the scenario. In fact, my initial motivation is to have it.
I think this discussion is very interesting and valuable, but I also think we need to include the main advance users of ECHAM into it if we are going to make a major change that affect the streams. What do you think?
Sure please do so =)
The original idea was to use the streams and streamsnc arrays to automatically GENERATE the output stream part in the namelist.echam, but we never got there.
Ok. I dont think this is feasible. How would you specify that e.g. 1) only certain variables from a specfic stream should be saved 2) in a specific temporal interval? If the yaml lists streams
and streamsnc
will be extended to support those and more features, I feel you end up having similar entries as in the actual namelist.echam. Why reinventing the wheel?
initially, because you could use the same style of writing that to set the output of fesom, openifs, nemo, etc. without having to do it differently for each model. but it is some amount of work, and noone really called for it, so...
There is related problem with the historical echam namelist, i.e. namelists/echam/6.3.04p1/HIST/namelist.echam
. The entries
&runctl
PUTDATA = 3,'hours','last',0
default_output = .true.
and
&mvstreamctl
interval = 1, 'days', 'last', 0
target = 'g3bid'
source = 'g3b'
variables = ... , 'temp2:mean', ...
/
yield, for annual runs, the variable temp2
in the annual files
<expid>_<YYYY>01.01_echam.nc # ntime = 2928 --> 3hourly output
<expid>_<YYYY>01.01_g3bid.nc # ntime = 366 --> daily output (this test year is a leap year)
The annual (cdo yearmean
) temp2 anomaly between the _echam
and the _g3bid
files looks like this:
or
cdo info anomaly_yearmean.nc
Minimum Mean Maximum
-0.064362 1.9698e-05 0.061951 # Kelvin
The monthly (cdo -seltimestep,1 -monmean
) temp2 anomaly between the _echam
and the _g3bid
files is even larger:
or
cdo info anomaly_monmean_mon1.nc
Minimum Mean Maximum
-0.12109 -0.0027946 0.11356 # Kelvin
The daily (cdo -seltimestep,1 -daymean
) temp2 anomaly between the _echam
and the _g3bid
files is even larger:
or
cdo info anomaly_daymean_day1.nc
Minimum Mean Maximum
-1.0143 -0.0027390 1.2628 # Kelvin
Since temp2 from the _g3bid
file is explicitly set to be the mean ('temp2:mean'
), temp2 from the _echam
file seems to represent something else, maybe a snapshot, its not clear. I think this is another argument to set default_output
to false since its not clear what the output is.
I can briefly comment on this one: temp2
in the default_output = true
is misleading. In fact the entire "default output" is...well, let's go with the description "weird". The file in that case is a mixture of snapshots and means. Christian is currently working on a table to definitively say which is which, but if I remember correctly temp2 was snapshots. I would therefore recommend caution using that particular variable in that file for any "sensible" analysis....
If you are after monthly means, I have a few template namelists that we are ironing out here. Check out the "production" version, that might be what you need. You actually even made the spinup ones :-) https://gitlab.awi.de/paleodyn/Models/namelists
Chris, to understand your screenshots (maybe I just need to read): those are anomalies between two files of the same run? We saw similar patterns comparing tsurf
and temp2
, but that was in one case a snapshot and in the second a monthly mean. You could rather clearly see where the sun was for the snapshot case (of course, there are also other differences -- tsurf
shows whatever the actual surface is. SST, soil, plant canopy....)
those are anomalies between two files of the same run?
Yes. Same variable, same run, different output files.
Alright, wrong place to complain, but: "ugh....echam....why"
Consider the following more to be "public note taking" (or whatever thinking out loud is for a forum):
For your screenshots, what you have is -- if I understand it correctly -- a yearly (or monthly, or daily) average of snapshots vs a yearly average of daily means. That would (maybe) explain the vertical bands you see there (day/night difference??) It's quite a small difference though, I would have expected something clearer if you're always capturing European midnight/3am/6am/noon.
Long story short, depending on the analysis you want to do, I would instinctively prefer the data in the g3bid
files.
Yet another point on my ever-growing list of why we need to fix the echam namelist for sensible output....
It's quite a small difference though
For someone working with daily data the difference is on the order of 1 K, i.e. super large :p
Yes, I was more referring to the top two figures. Funny how it is so localized over North America. I would have guessed that if you have 3 hour output and average noon/3pm/6pm/9pm/midnight/3am/6am/9am that there isn't such a clear spatial pattern. Plus you can see waves over Eurasia. Odd....but, as I said, I'd just use the data in the other file, that will be at least clearer.
And still, one for the list: we need to tame echam output. Urgently.
The default stream definitions are printed in every atmout. The echam:temp2
variable is defined as
name : g3b
output file suffix: _echam
name units rank ke alloc. grid prt acc mis rst tbl cde bit lev_type
temp2 K 2 1 T GAUSSIAN T F F T 128 167 16 SURFACE
whereas the gsbid:temp2
variable is defined as
name : g3bid
output file suffix: _g3bid
name units rank ke alloc. grid prt acc mis rst tbl cde bit lev_type
temp2 K 2 1 T GAUSSIAN T T F T 128 167 16 SURFACE
that means
echam:temp2:acc=laccu=F # see mo_linked_list.f90 for how the lines above are printed in atmout
g3bid:temp2:acc=laccu=T
In the echam6 docu its written that (p. 119)
In order to write a field to an output file, lpost=.true. must be specified. Generally the actual values of the field are written. However, if laccu=.true. is specified, the values are divided by the number of seconds of the output interval before output and set to the value of the variable reset afterwards.
And indeed, the default of laccu
is false (p 118) and this also explains the differences in those two temp2 variables as laccu=true
yields:
“Accumulation” flag: Does no accu- mulation but divides variable by the number of seconds of the output in- terval and resets it to reset after output.
That would mean that for any variable, that is not explicitly defined via
&mvstreamctl
interval = 1, '<interval>', 'last', 0
target = '<filetag_where_varname_should_be_saved>'
source = '<echam_streamname_from_which_to_take_varname>'
variables = ... , '<varname>:mean', ...
/
, one must check the respective atmout-entry to figure out if it represents accumulated (laccu=true
), i.e. averaged values, or non-accumulated (laccu=false
), i.e. snapshots?
One note: the outputs seems to be recorded for any changed stream. Not sure if the default is also written to the log. If it is, that'd be great to know, please also forward that info to Christian.
At some point, we need to ask Hamburg to clarify. Perhaps that point has been reached. At least on my end, I'm out of expertise. Sorry.
@mandresm @denizural @dbarbi could you please implement some kind of switch so that switching namelist.echam and/or namelist.jsbach becomes possible? The current workflow via streams
(and streamsnc
) makes this impossible as files from user-specific streams will be ignored if they are not set in those echam.yaml or jsbach.yaml lists.
You could try to implement that yourself. I can take 20 minutes tomorrow and show you the relevant code that would need to be modified.
Open source works best if anyone who wants a feature tries to build it. That also would be good to increase overall knowledge of the code base by more than just the core team.
So, if you want to try, pull develop, and start a new branch from there, open a draft PR, and via comments and whatever, we will talk you through it :-)
Dropping echam.yaml:streams would be a large modification of the esm tools. I have not enough knowledge and time for this =)
There is a relevant PR that allows you to use the echam namelist to automatically define the streams. You must set it up in your run config. Please see: https://github.com/esm-tools/esm_runscripts/pull/165
Closing for now, re-open if more discussion is needed.
The PR does not work.
The following workaround quasi-works:
echam:
namelist_dir: "/path/to/my/special/namelist.echam/"
further_reading: "echam_myoutput.yaml"
jsbach:
further_reading: "jsbach_myoutput.yaml"
cat echam_myoutput.yaml
streams:
- echamstream1
- echamstream2
streamsnc: ${streams}
and
cat jsbach_myoutput.yaml
streams:
- jsbachstream1
The following does not work yet:
outdata/<model>
dirs. The output of all other months is moved to unknown/
.code
files are not considered at all and moved to unknown/
I would not say the PR does not work. I would say it does not work yet ;-)
Can you send me the path to your experiment? I will have a look.
/work/ba1103/a270073/out/awicm-1.0-recom/awi-esm-1-1-lr_kh800/historical2
Correction: the 2 problems affect jsbach files only.
Catching up on this: Does the reduced output look realistic in release 6?
Due to the totally different output strategies of the individual models, I think a general out of-the-box implementation in the esm_tools is difficult.
For fesom it works quite well via fesoms output scheduler and a yaml file listing all wanted variables in the wanted output frequencies.
For echam/jsbach its more complicated. I found a workflow that suits for me, it works using
stream
lists in the echam and jsbach yamls
for a set of variables that I want to save in a desired frequency. I dont know how to properly implement this in the esm_tools so that its useful. Work in progress I would say.For other models I dont know.
@JanStreffing: yes of course the output is realistic.
Hi all,
I've been looking at this issue for a bit and also to the solution @pgierz offered, that I believe never got merged (esm-tools/esm_runscripts#165). Is that correct?
I see the solutions given to, and used by @chrisdane as provisional solutions, as in my eyes there is a "major" issue as @chrisdane points out, and it's that there is duplication of information. Not only that, but if you add a stream on a namelist and then you forget to update the stream list you won't realize unless you are looking at the log files very much in detail or you are monitoring the output and making sure it doesn't get dump into the unknown
folder.
I think the streams for echam
and jsbach
should be built based on the final namelist as @pgierz tried in his PR.
I think it would also be great to add the stream's control through the runscript (still providing a good namelist.echam
template which the user modifies through the runscript). That would mean new syntax specific to ECHAM, but it could be made so that it's as close as possible to the namelist syntax, or the existing namelis_changes syntax, so that it is intuitive to use.
Please, let me know what do you think about these two ideas (getting a version of what Paul did into release 6 and stream-namelist modification through the runscript). In turn, if you are okay with your workflow now and you don't want any changes, you can go ahead and close the issue.
Hi
The PR https://github.com/esm-tools/esm_runscripts/pull/165 does not work.
I think its close to impossible to build an algorithm that deals with all possible namelist.echam stream definitions. Its a black box.
I think the better solution is to put a list of streams
(and streamsnc
) in a separate file next to the used namelist.echam file, overwriting the default streams
. E.g. streams1=stream1,stream2,...
to namelist.echam1 in which model output from stream1 and stream2 is wanted and streams2=stream3,stream4,...
to namelist.echam2 in which model output from stream3 and stream4 is wanted. Sure, that's not beautiful, but at least it would enable some sort of modularity.
So I am ok with closing, but this issue remains a dead end for the modularity approach of the esm_tools :(
My idea was not to build a full syntax for the streams, but instead let the user include their modifications to the namelist through the runscript with the same structure of the namelists, with sections and variables, the same way that was done for namelist_changes
but special to the streams in that repetition of sections would be allowed (which is not the case in namelist_changes
). That's one point and it might be an overkill.
The other point is the PR https://github.com/esm-tools/esm_runscripts/pull/165. That one is broken, but is fixable as the problem it is trying to fix is relatively simple (at least as I understand it, maybe I am loosing something important): read the namelist and extract the stream names. This one might be something we want to pursue in the future. Anyway, let's see if someone reopens this one or something similar in the future.
I would not say close to impossible. After all, there are rules inside of echam for how it produces output and what those files are called. Rules which we could replicate. Yes, it is very echam specific, but we already have specific model things on the tools. See for example Oasis. Plus Echam being sketchy about its documentation should be given as a lesson to anyone who thinks about writing climate code. Our own project is also slowly working on improving that.
To me this boils down at the end to a design question. Duplicate information is by definition error prone. You are bound to forget one of your multiple places. I'd like to keep this issue open, as a place for discussion if nothing else.
This issue has been inactive for the last 365 days. It will now be marked as stale and closed after 30 days of further inactivity. Please add a comment to reset this automatic closing of this issue or close it if solved.
Hi
Is your feature request related to a problem? Please describe. The default
namelist.echam
for aPI-CTRL
experiment, which is (almost always?) also used for spinups, creates a lot of data on a high temporal interval (< month) due to the namelist blocksFor a spinup, this is unnecessary and bad practice since nobody will ever need this data but the disks are full with it.
Describe the solution you'd like As far as I understand the esm tools, I would like to have a
esm_tools/namelists/echam/<version>/PI-CTRL-SPINUP/namelist.echam
which is the same asesm_tools/namelists/echam/<version>/PI-CTRL/namelist.echam
but with only a few important variables on monthly output frequency, or similar.I tried to achieve this. I ran a default echam-only
PI-CTRL
experiment on ollie:The resulting echam output after 1 month is
and for jsbach
To test for reduced echam output, I ran the experiment
The resulting echam output after 1 month is
and for jsbach
The total size of
outdata
of 1 month reduces fromto
This reduction is roughly 100 - 4.6 MB / 367 MB * 100 ~ 99% although many of the most important variables are included. Of course, one could argue if e.g. the 3d variable
q
(specific humidity) needs to be included or if other variables should be included and so on ...I would be very happy if you could implement this in some way as a standard for echam since I am really sick of this huge unnecessary spinup output everywhere.
Thanks a lot for consideration, Chris
ps: I couldnt get the jsbach stream
veg
to work (vegmon
in the new namelist.echam in theecham_scenario_PI-CTRL-SPINUP
branch).