support additive forcing perturbations

aekiss commented 5 years ago

Branching off from https://github.com/COSIMA/libaccessom2/issues/25 as a separate issue.

Implement this via a new offset_filename key, so that we can have both additive and multiplicative perturbations to any forcing field, which would be a useful extension to our current scaling-only perturbation capability discussed in the wiki.

An example forcing.json entry:

    {
      "filename": "/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/land/day/friver/gr/v20190429/friver/friver_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-4-0_gr_{{year}}0101-{{year}}1231.nc",
      "fieldname": "friver",
      "offset_filename": "river_offset_{{year}}0101-{{year}}1231.nc",
      "cname": "runof_ai"
    },

rmholmes commented 3 years ago

@aekiss @aidanheerdegen I am resurrecting this issue as there are a few experiments that various people would like to run that would be made significantly easier with such an addition.

As an example, Qian is going to be running an experiment with a linearly increasing air temperature modelled on CMIP6 projections, added to the RYF9091 JRA55 forcing. I.e. she would like:

Tair(x,y,z,t) = TairRYF(x,y,z,t) + Tair_CMIP_pattern(x,y) * Tair_CMIP_timeseries(t)

where TairRYF is the RYF9091 field, Tair_CMIP_pattern(x,y) is a spatial pattern of anomalies and Tair_CMIP_timeseries(t) is a time series of the amplitude of that pattern (a linear function in Qian's case).

Such an anomaly is difficult to do in the current system as it requires making a separate copy of the JRA55 Tair files for every year of the simulation. If you're doing a 100-year experiment, then you either have to create 100 copies of the JRA55 data (and they are big files) or you have to manually change the files every year.

I also have some ideas for experiments using simple time-dependent, spatially-independent (Tair_CMIP_pattern = 1) forcing field anomalies that would benefit from a similar approach.

So I wanted to start a discussion on this. I see two possible (high-level) approaches to make this easier:

Implementing an additive scaling offset as described in this issue. However, it would need to be generalized to have separate inputs for the spatial pattern (Tair_CMIP_pattern) and the time series (Tair_CMIP_timeseries) or you're not really doing any better than copying the full forcing files. This may be difficult.
Implementing a post-processing script that is called at the end of every year (for the 1/10th-degree it would only need to be called every 4 3-month-submits when the RYF9091 forcing fields cycle over) that updates the forcing files before the next year is simulated.

Both of these approaches have issues. I am more attracted to no. 2 above because I think it is more general (e.g. it permits more complicated changes, such as keeping the relative humidity constant as the temperature anomalies are applied). I would be interested in everyone's thoughts.

AndyHoggANU commented 3 years ago

Hi @rmholmes - sorry we didn't get a chance to discuss this in today's MOM meeting. I think option 2 is a nice idea, as it's more flexible and more able to be deployed for a wider array of uses. I guess the key issue is whether payu can get enough information out of the run to supply the script making the forcing files. If the information required was only time (i.e. other functions of x,y were hardwired into the script) then I think we could probably do this.

The key is that, with a small addition to payu, we could do just about anything. @aidanheerdegen - I guess we are relying on you here to have a think and let us know if a payu addition is feasible for this. If yes, maybe a quick online discussion on Monday could be in order?

rmholmes commented 3 years ago

I think a zoom chat about this would be a good idea, perhaps after @aidanheerdegen has had a chance to think about what might be possible.

One more potential spannar in the works here that is worth considering: After chatting with Qian and Matt yesterday I think Qian's experiment would also benefit from maintaining the atmospheric relative humidity constant as the air temperature anomalies are applied. This is a bit more difficult to fit into the framework as the specific humidity perturbation is then an (x,y,t) perturbation that is not obviously separable into (x,y) and (t). I would want to do the same for the other experiment/s I have in mind.

However, perhaps a completely different way to get around this problem was if we could feed access-om2 relative humidity rather than specific humidity as an input field?

aidanheerdegen commented 3 years ago

@nichannah has assigned himself to this, so I wouldn't suggest spending any time on this until he can let us know the progress on this. He might already have a solution, or perhaps some well thought through plans for implementation.

Nic?

AndyHoggANU commented 3 years ago

I think Nic assigned himself this 18 months ago, then completed the work and the issue was closed -- Ryan then reopened it with his new ideas... But I am of course happy if Nic wants to take this one on?

aidanheerdegen commented 3 years ago

Oh right, my misunderstanding. Soz.

There are no tagged PRs or commits which would make it easier to track related changes to see what has already been done and how new functionality might be incorporated.

nichannah commented 3 years ago

Yes, I'm happy to do it at the libaccessom2 level if that is easier that via Payu. Presently we have the ability to do scaling by either a file with a time dimension or a single constant. In the simplest case we could replicate this but use addition instead of multiplication - using a syntax like @aekiss proposed above. Would that be suitable? On the other hand it might be worth thinking about something more general such as a function of the forcing variables from two files. e.g

    {
      "filename": "/g/data/ua8/JRA55-do/RYF/v1-3/RYF.rsds.1990_1991.nc",
      "pertubation_filename": "../test_data/scaling.RYF.rsds.1990_1991.nc",
      "pertubation_function": "rsds = rsds_A + rsds_B"
      "fieldname": "rsds",
      "cname": "swfld_ai"
    },

This could help: https://github.com/jacobwilliams/fortran_function_parser

rmholmes commented 3 years ago

Thanks @nichannah.

So just to confirm - when you say "scaling by either a file with a time dimension or a single constant" - I think you mean that the scaling file (currently multiplicative - but easy to implement additive) is a single file that contains both spatial and temporal information?

Is it possible to separate the spatial and temporal scaling information? This would really help with applying the perturbations that we commonly want that have a fixed spatial pattern (possibly even spatially-constant) but a long-term temporal variation (e.g. a linear trend in Qian's case, or a sinusoidal oscillation with an interannual to decadal period, as I would like to implement). Ideally we want this temporal variation to be smooth across each JRA55 input time step - meaning that to do it properly without a system such as we are discussing here requires copying the JRA55 forcing files at full temporal resolution for the whole time period of the simulation (50+ years in Qian's case).

If such were possible then it would cover the cases I am thinking of, with the below significant caveat.

The caveat - relative humidity: In both Qian and my experiments it would be best to keep the relative humidity constant while the air temperature is changed. To do this while specific humidity is the input field requires changing the specific humidity through a complicated function of unperturbed and perturbed air temperature, unperturbed specific humidity and air pressure (see this script that Claire put together). This caveat could be easily solved if we could change the coupling system to allow relative humidity as an input field rather than specific humidity. E.g. in atmosphere/forcing.json we would want to replace,

"filename": "INPUT/RYF.q_10.1990_1991.nc",
"fieldname": "huss_10m",
"cname": "qair_ai"

with

"filename": "INPUT/RYF.r_10.1990_1991.nc",
"fieldname": "rhuss_10m",
"cname": "rair_ai"

where rair is the relative humidity calculated just once from the other JRA55 forcing fields. Is this possible?

russfiedler commented 3 years ago

I've had a bit more of a think about using relative humidity as a forcing field. Since this is about physics I'm now of the opinion that it should be treated in CICE and can be easily done in the subroutinefrom_atm. We can flag if rhair_i (or whatever we call it in CICE) is supplied and then after all the fields are read

if(do_rh2q) then where(tair0 /= 0.0) qair0 = rh2q(tair0,rhair0,press0) endif

where rh2q is a simple elemental function. I think I'd put it in cpl_forcing_handler.F90. We don't need to do anything else to the code as the relative humidity has done its job and is never used again.

How's that sound?

rmholmes commented 3 years ago

That sounds pretty good to me @russfiedler. However, I can't say that I know much about CICE.

We would then just need to create new RYF input files that contained relative humidity (rhair0) rather than specific humidity, do the substitutions in atmosphere/forcing.json that I indicated above and activate this flag so that CICE knows to take rhair0 as an input rather than qair0 as an input.

I would have expected that a relative - specific humidity conversion would already occur somewhere in the code. Again, I know little about CICE but I did find that in source/ice_forcing.F90 there is a relative humidity variable already rh, in the subroutine oned_data. I'm not sure if that is relevant?

I can volunteer to do some testing in ACCESS-OM2-1 once these changes are made.

russfiedler commented 3 years ago

@rmholmes We're using the gfdl form for fluxes so I'd use probably use the lookup tables for the saturated vapor pressure in sat_vapor_pres_mod.F90. We should have q=0.622*(rh/100)*e_sat(T)/(p-0.378*(rh/100)*e_sat(T)) if rh is supplied as a percentage.

aekiss commented 3 years ago

@rmholmes - Nic was referring to this: https://github.com/COSIMA/libaccessom2/issues/31 - ie support for spatially uniform and temporally constant scalings specified by a scale_constant as an alternative to the spatiotemporally varying scalings specified by scaling_filename and described here.

@nichannah fortran_function_parser looks interesting but it appears to have been active for just a few weeks weeks 4 years ago and then abandoned, so if we used that we may need to also maintain it ourselves (but first find out why it was abandoned).

Here's a straw-man: If we are happy to support just scaling and offsets of any input field f of the form scaling(x,y,t)*f + offset(x,y,t) we could have entries like this inforcing.json:

    {
      "filename": "/g/data/ua8/JRA55-do/RYF/v1-3/RYF.u_10.1984_1985.nc",
      "scaling_filename": "/whatever/RYF.u_10.1984_1985_scale.nc",
      "offset_filename": "/whatever/RYF.u_10.1984_1985_offset.nc",
      "temporal_scaling_filename": "/whatever/RYF.u_10.1984_1985_temporal_scaling.nc",
      "temporal_offset_filename": "/whatever/RYF.u_10.1984_1985_temporal_offset.nc",
      "fieldname": "uas_10m",
      "cname": "uwnd_ai"
    },

where the 4 scaling and offset entries are all optional. If present, they would all contain a field matching fieldname. Any or none of them could contain a time axis. The temporal_* entries contain no spatial axes, but the others do.

scaling in the formula above would be given by scaling_filename * temporal_scaling_filename; if either is absent, it is replaced by 1; if neither has a time axis then the scaling will be unchanging in time. This way we can specify arbitrary spatiotemporal scaling, or separable spatial*temporal, or spatial-only, or temporal-only.

Similarly offset in the formula above would be given by offset_filename * temporal_offset_filename; if either is absent, it is replaced by 0; if neither has a time axis then the offset will be unchanging in time. This way we can specify arbitrary spatiotemporal offsets, or separable spatial*temporal, or spatial-only, or temporal-only.

If scaling and offset files are both present, they are both applied as in the formula.

Would that cover all the use cases we can think of? (other then the humidity issue, which it looks like we can deal with via a code change in CICE)

@nichannah does this seem straightforward to implement?

@russfiedler this proposal would make scale_constant redundant, as this could be achieved by temporal_scaling_filename with no time axis.

aekiss commented 3 years ago

Also, where time axes are present, the scalings/offsets would only be applied at times covered by those axes (which is how it works currently, allowing a brief perturbation to damp down an individual storm).

aekiss commented 3 years ago

Further possible extensions (probably over the top):

support complex values in any perturbation field, allowing concise representation of propagating perturbations (only the real part of the product scaling_filename * temporal_scaling_filename or offset_filename * temporal_offset_filename would be used in scaling(x,y,t)*f + offset(x,y,t))
allow any of the perturbations to be a list of files rather than just one, and combine them pairwise. This would allow (say) an EOF reconstruction. This would make 1. redundant, since quadrature components could be used in place of complex values.

aekiss commented 3 years ago

Following a discussion on Tuesday, here's an updated straw-man proposal replacing the above. It's long and detailed because I've tried to be explicit about the semantics and corner cases, which in most cases is simply spelling out what you'd expect. Apologies for the boring read!

It's unclear to me how much work would be needed to implement something like this, but if we decide to implement only a subset it would be good to bear this bigger picture in mind to allow for a possibility such as this in the future.

Objective

To support the scaling and offsetting of any input field f of the form

scaling(x,y,t)*f + offset(x,y,t)

and allow efficient representation when scaling and/or offset have:

arbitrary spatiotemporal variation, or
a sum of (arbitrary spatial patterns multiplied by arbitrary temporal variations), e.g. an EOF reconstruction, or
arbitrary spatial variation (temporally constant), or
arbitrary temporal variation (spatially constant), or
constant in both space and time

where the temporal variation may be either

referenced to the experiment calendar, allowing progressive changes spanning multiple RYF years (e.g. a ramp over several RYF years), or
referenced to the forcing calendar, and therefore repeating in each RYF year (e.g. to damp out a storm)

Notation

forcing.json would have a format like this:

{
  "description": "JRA55-do V1.3 RYF 1984-85 forcing",
  "inputs": [
    {
      "filename": "/g/data/ua8/JRA55-do/RYF/v1-3/RYF.u_10.1984_1985.nc",
      "fieldname": "uas_10m",
      "cname": "uwnd_ai",
      "scaling": [
          {"spatial": "/path/to/nc/file", "temporal": "/path/to/nc/file", "calendar": "experiment"},
          {"spatiotemporal": "/path/to/nc/file", "calendar": "forcing"}, 
          ...
          ]  
      "offset": [
          {"spatial": "/path/to/nc/file", "calendar": "forcing"},
          {"temporal": "/path/to/nc/file", "calendar": "experiment"}, 
          {"spatial": "/path/to/nc/file", "temporal": "/path/to/nc/file"},
          ... 
          ]
    },
    ...
  ]
}

Semantics

scaling and offset are optional arrays of any length. Their lengths may differ. Either, both, or neither may be present; missing scaling is treated as 1 in the formula above; missing offset is treated as 0.
Elements in scaling and offset arrays are dictionaries, each dictionary defining a term. These terms are summed to form scaling(x,y,t) and offset(x,y,t) (respectively) in the formula above. Recognized keywords are:
- spatiotemporal: a string, the path to an .nc file containing a field (whose name matches fieldname) with both spatial and time axes, which is the spatiotemporal scaling or offset data
- spatial: a string, the path to an .nc file containing a field (whose name matches fieldname) with spatial axes only, which is the spatial part of the scaling or offset data
- temporal: a string, the path to an .nc file containing a field (whose name matches fieldname) with time axis only, which is the temporal part of the scaling or offset data
- constant: a number, defining a spatially and temporally constant value
- calendar: a string, either "experiment" or "forcing", specifying whether time is referenced to the experiment or forcing calendar for this component. Default is "forcing"
- comment: an arbitrary string, not used by libaccessom2 but intended to be used to document what this element of the array represents.
At least one of spatiotemporal, spatial, temporal and constant must be present in each dictionary.
- If spatiotemporal is present, spatial, temporal and constant must be absent. This defines a spatiotemporally arbitrary scaling or offset (case 1 above).
- If constant is present, spatiotemporal, spatial and temporal must be absent. This defines a spatially and temporally constant scaling or offset (case 5 above).
- Either or both of spatial and temporal may be present if spatiotemporal and constant are absent. If temporal is absent, then spatial defines a spatial pattern that's present at all times (case 3 above). If spatial is absent, then temporal defines the time dependence of a spatially-uniform pattern (case 4 above). If both spatial and temporal are present, they are multiplied together to define a spatiotemporal field which is a component of the sum in case 2 above.
Spatial axes in the spatiotemporal and spatial .nc files must be identical in name, dimension, and order to those of fieldname in filename. The coordinate values are not used or checked.
The time axis in the spatiotemporal and temporal .nc files must have the same name as that for fieldname in filename, but it need not have the same values.
If a component contains a time axis (i.e. if spatiotemporal or temporal are present) the component is omitted from the sum at all forcing time points except
- when the forcing time with the year replaced by the experiment year coincides with a time in the .nc file's time axis, if "calendar": "experiment" - so in this case the time axis typically has the same resolution as the forcing but has dates that may cover any or all experiment year(s)
- when the forcing time coincides with a time in the time axis, if "calendar": "forcing" - so in this case the time axis typically has the same resolution and year range as the forcing (but may differ in duration)
Any of the .nc files may contain complex data, but only the real part of the sum of the components is used in scaling(x,y,t) and offset(x,y,t).
Any {{year}} and {{year+1}} components in file paths are replaced by year and year+1 according to calendar for that component.

Comments

I'm now proposing using arrays of dictionaries, since this avoids the possibility of having differing length arrays of spatial and temporal components.
The 5 cases are now explicit in the forcing.json notation via the presence/absence of spatiotemporal, spatial, temporal and constant, rather than implicit in the presence or otherwise of a time axis in the .nc files. This seems much less error-prone and also provides better documentation of the configuration setup. However there is some redundancy in these (e.g. spatial-only covers the constant case), so perhaps it would be better to streamline the options for ease of implementation?
This forcing.json notation is a bit more general than is needed to cover the objectives listed above - e.g. each component has its own calendar keyword. This allows e.g. damping out a storm in each RYF year while also applying an interannual perturbation.
Although this forcing.json notation is not backward-compatible with the previous one, it is more general, and the use of dictionaries makes it extendable to cover other possibilities in future by implementing new keywords.
This will only support perturbations that are a linear function of the original field f at each grid point and timestep. A more general (i.e. nonlinear) function of f would require something else, e.g. a function parser as Nic suggested.

rmholmes commented 3 years ago

Thanks @aekiss for the comprehensive plan. It looks good to me, and I like the proposed dictionary format for forcing.json.

One question on point 6 of semantics: What do you mean in the "calendar": "experiment" case by "the time axis typically has the same resolution as the model"? Does this mean for the calendar we have to define the time axis in the .nc file at every model time step? I would have thought that we only need to define these for each forcing time step?

aekiss commented 3 years ago

Good catch @rmholmes, that also occurred to me after posting. I agree it would be much better to have it on the forcing timestep in the "calendar": "experiment" case, otherwise it won't work if the model timestep is changed. The difficulty is that the model and forcing calendars can differ by an arbitrary number of years. I'll edit the post above to try to fix it.

aekiss commented 3 years ago

ok how does that look now @rmholmes ?

rmholmes commented 3 years ago

Thanks @aekiss - looks good.

Is there any chance that leap years/days will cause problems with "calendar": "experiment"? I.e., the time vector we use in the "temporal" .nc file may have to be defined for years/days that match the RYF year but actually don't exist in a real calendar? I suspect this is not an issue as things work fine now with the current experiment time. However, it's probably something that we need to be aware of when creating the forcing perturbation files.

aekiss commented 3 years ago

Hmm, good question @rmholmes.

As I understand it, the forcing and experiment calendars ~can only differ in the year~ no - see below (in fact I've assumed this in my description of the "calendar": "experiment" case), so if the forcing dataset has no leap years (as is the case with the standard RYF years), neither will the experiment calendar. For example, despite being a leap year in a normal calendar, 1952 has no 29 Feb in /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/output204/ocean/ocean_daily.nc. So the RYF runs effectively use a noleap calendar.

How do leap years work for your repeat-decades Ryan?

@nichannah if you can clarify or correct me, please do!

mauricehuguenin commented 3 years ago

@aekiss Our repeat decade forcing (RDF) is 1962-1971. Both 1964 and 1968 are leap years. We only output monthly and annual data, so I do not know if the experiment calendar in our RDF is running with a leap year calendar as it should. Is there a way to check that?

rmholmes commented 3 years ago

Thanks @mauricehuguenin. I suspect the RDF experiment time is like the RYF one - the leap years appear when they appear in the forcing (1964 and 1968), regardless of the experiment year. When you run the IAF off the end I think you do it like an ensemble OMIP experiment with a fresh restart.

mauricehuguenin commented 3 years ago

Yes, that is what I do Ryan. I reset the experiment date back to 1972 when I branch off my IAF run over 1972-2017 from the control.

I just checked in ocean_solo.res and can confirm that our RDF experiment runs with a Gregorian calendar (that has leap years).

aekiss commented 3 years ago

So does it look like this? If so, it's not a Gregorian calendar...

forcing 62 63 64 65 66 67 68 69 70 71 62 63 64 65 66 67 68 69 70 71 model 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

bold are correct leap years bold italic are model leap years that should be non-leap years in the Gregorian calendar italic are model non-leap years that should be leap years in the Gregorian calendar

mauricehuguenin commented 3 years ago

How would I be able to check this? The ocean_solo.res file shows 3 (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)

aekiss commented 3 years ago

grep cur_.*_date output*/atmosphere/log/matmxx.pe00000.log | grep -- -02-29T00 will show you all the forcing and model leap years

mauricehuguenin commented 3 years ago

Thanks, unfortunately the grep command did not give me an output but the log files show this:

{ "cur_exp-datetime" :  "1972-02-28T21:00:00" }
{ "cur_forcing-datetime" : "1962-02-28T21:00:00" }

followed by

{ "cur_exp-datetime" :  "1972-02-29T00:00:00" }
{ "cur_forcing-datetime" : "1962-02-28T00:00:00" }

So the experiment calendar has correct leap years. During these days, the forcing for the 28th of February is applied twice. Our spin-up is 2000 years so this should not be an issue.

aekiss commented 3 years ago

Interesting. libaccessom2 aborts if there is a mismatch like that at the end of the run, but maybe it gets back into sync before then. How long are your runs?

mauricehuguenin commented 3 years ago

I always ran for ten years during the 2000-year long spin-up with the 1° model and for two years in the 1/4° model. That worked fine. I am not sure if at some point we received warnings that we ignored. The model however never crashed with a date out of sync error.

aekiss commented 3 years ago

OK so seems that in your case we actually have

forcing 62 63 64 65 66 67 68 69 70 71 62 63 64 65 66 67 68 69 70 71 model 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

so the experiment and forcing dates can differ by more than just the year, so @rmholmes' concern was valid and we may need some other approach.

@mauricehuguenin, what does it look like on the following day (expt date 1972-03-01)?

and what about the 3 days surrounding expt date 1974-03-01? What experiment date is the forcing on 1972-02-29 applied to (if any)?

mauricehuguenin commented 3 years ago

For the experiment year 1972 the following forcing is applied:

{ "cur_exp-datetime" :  "1972-02-28T21:00:00" }
{ "cur_forcing-datetime" : "1962-02-28T21:00:00" }
...
{ "cur_exp-datetime" :  "1972-02-29T00:00:00" }
{ "cur_forcing-datetime" : "1962-02-28T00:00:00" }
...
{ "cur_exp-datetime" :  "1972-03-01T00:00:00" }
{ "cur_forcing-datetime" : "1962-03-01T00:00:00" }

For the days surrounding the experiment date 1974-03-01, the log looks like this:

{ "cur_exp-datetime" :  "1974-02-28T21:00:00" }
{ "cur_forcing-datetime" : "1964-02-28T21:00:00" }
... 
{ "cur_exp-datetime" :  "1974-03-01T00:00:00" }
{ "cur_forcing-datetime" : "1964-03-01T00:00:00" }

I understand it that the calendar is set by the experiment. For the first leap day in the experiment calendar, the first leap day forcing from the RDF is applied. If there is no leap day in the experiment calendar (but one in the forcing), that forcing day gets skipped.

aekiss commented 3 years ago

OK so if (for example) an increasing ramp perturbation

  /
 /
/

is applied, it will have a horizontal step

 _/
/

if there's a leap day in the experiment but not the forcing, and will have a vertical jump

 /

/

if there's a leap day in the forcing but not the experiment

russfiedler commented 3 years ago

I've made a first cut at getting Relative humidity as a forcing into here https://github.com/russfiedler/cice5/tree/rel_humid It compiles but I need it to be tested. @rmholmes ?

It's all done in CICE (auscom driver module). Just need to specify the input file and relh_i in the namcouple and fields_from_atm namelist in input_ice.nml rather than qair_i

nichannah commented 3 years ago

Thanks for this @aekiss. It looks good to me. The only thing that jumps out is:

"scaling and offset are optional arrays of any length."

My concern is that having long arrays could quickly become very confusing for anyone other than the original author. It won't be clear what the actual forcing is.

Is this necessary, or can we leave it up to the user to make combined forcings if needed?

aekiss commented 3 years ago

The trouble with making combined forcings is if the perturbation is a sum of spatial patterns multiplied by a scalar timeseries (e.g. an EOF reconstruction) you wouldn't be able to represent it efficiently. I would not anticipate the arrays becoming very long in practice. If a hard-coded maximum is needed for implementation, I think 10 would be plenty as I wouldn't expect anyone to need more than a few.

Did you have any comments about the date handling with leap years in repeat-decade forcing? I'm not sure I completely understand how it works.

nichannah commented 3 years ago

wrt. leap years it works like this:

The model (experiment) data and forcing date are only allowed to differ in the year. This is checked every time the date is progressed so an error or warning will be raised immediately if things go wrong. There is a namelist option to choose between a warning and an error. There are two exceptions:

the forcing date is a leap day (29/02) but the experiment is not (it must be 01/03). In this case the forcing is skipped.
the experiment date is a leap day but the forcing is not. In this case the forcing from 02/28 is replayed.

aekiss commented 3 years ago

ah ok, thanks for the clarification, that matches what we inferred above

aekiss commented 3 years ago

I've added a comment keyword to the specification above, as discussed at the last TWG meeting. We didn't decide whether it should be mandatory.

rmholmes commented 3 years ago

@nichannah I noticed you just closed this issue. Where are you up to with this? Asking because I've got a student who might want to start using this in the next month or two. Thanks!

nichannah commented 3 years ago

Hi @rmholmes, the implementation for this is done.

Tutorial on how to use it: https://github.com/COSIMA/access-om2/wiki/Tutorials#Scaling-the-forcing-fields

nichannah commented 3 years ago

Re-opening - should add further documentation to the libaccessom2 README.md. At the least to point to the tutorial referenced in previous comment.

rmholmes commented 3 years ago

Great, thanks! I'll jump in and try testing it next week.

aekiss commented 3 years ago

Great, thanks @nichannah - when the documentation is ready it would be good to announce this on Slack, as there are plenty of people who would be interested.

rmholmes commented 3 years ago

Thanks for the work on this @nichannah. I've been testing it out a bit and seems to be working well - I can do constant perturbations, spatial perturbations, and experiment-calendar based smooth-time perturbations that span multiple RYF years (very useful!).

One question: As discussed in my comment here, and Andrew's comment here, one thing that would be nice to do would be to separate out the temporal and spatial dependencies on the offset. Is this possible with the current setup? I don't think so (see below), but perhaps I am doing things wrong.

For example, I tried:

  {
      "filename": "INPUT/RYF.tas.1990_1991.nc",
      "fieldname": "tas",
      "cname": "tair_ai",
      "perturbations": [
        {
          "type": "offset",
          "dimension": "temporal",
          "value": "INPUT/RYF.tas.1990_1991_scaling.nc",
          "calendar": "experiment"
        },
        {
          "type": "offset",
          "dimension": "spatial",
          "value": "INPUT/RYF.tas.1990_1991_scalingPAC.nc",
          "calendar": "experiment"
        }
      ]
    },

where I have a time series A(t) in INPUT/RYF.tas.1990_1991_scaling.nc and a spatial structure B(x,y) in INPUT/RYF.tas.1990_1991_scalingPAC.nc. I expected my tas to be:

tas(x,y,t) = tasJRA55(x,y,t) + A(t)*B(x,y)

But I think I actually got:

tas(x,y,t) = tasJRA55(x,y,t) + A(t) + B(x,y)

nichannah commented 3 years ago

Hi @rmholmes, OK I think I missed that in @aekiss comment above. I see it now. My syntax is a bit more rigid than the one he suggested so I'll need to figure out how to fit in or modify mine. One thing that comes to mind is to 'perturb' (or nest) the perturbations as below, however this seems over the top to support the single case you mention.

{
      "filename": "INPUT/RYF.tas.1990_1991.nc",
      "fieldname": "tas",
      "cname": "tair_ai",
      "perturbations": [
        {
          "type": "offset",
          "dimension": "temporal",
          "value": "INPUT/RYF.tas.1990_1991_scaling.nc",
          "calendar": "experiment"
          "perturbations": [
            {
              "type": "scaling",
              "dimension": "spatial",
              "value": "INPUT/RYF.tas.1990_1991_scalingPAC.nc",
              "calendar": "experiment"
             }
          ]
        },
      ]
    },

rmholmes commented 3 years ago

Thanks @nichannah.

This probably requires a bit more of a detailed discussion and I am not sure exactly how the system works now. But one option would be, when there are multiple offsets, multiply them together rather than add them. Personally, I feel this would be more useful than being able to add multiple offsets (I can think of at least 3 current and former projects that would use this). But then again I am only one user.

aekiss commented 3 years ago

The problem is that this implementation supports only four cases, but we need to support a fifth case (number 2 in this list), which is a "separable" offset that is the product of a scalar timeseries and a static spatial pattern (i.e. a combination of Nic's cases 2 and 3).

As discussed with @nichannah today, I think this issue would be resolved if there was a fifth option, "variable": "separable" for which "value" would be an array of two filename strings, the first being the scalar timeseries and the second being the static spatial pattern it would multiply. We would need to support this for "type": "offset", and it would probably be best (i.e. least confusing) to also support this for "type": "scaling", even though it is a bit redundant in that case.

aekiss commented 3 years ago

Also it would be nice to have an optional "comment" entry for each perturbation (see above).

Its value would be a string that would be ignored by the code but would allow the user to document what they're doing (this is needed because there's no support for comments in json).

aekiss commented 3 years ago

@nichannah I think you mentioned yesterday that the terms with "type": "scaling" are multiplied, but the terms with "type": "offset" are added - have I got this right? i.e. are you doing this?

f = ( scaling1(x,y,t) * scaling2(x,y,t) * ... * scalingN(x,y,t) )*f + offset1(x,y,t) + offset2(x,y,t) + ... + offsetM(x,y,t)

This is different from what I suggested above, which was to add the terms in both cases, i.e.

f = ( scaling1(x,y,t) + scaling2(x,y,t) + ... + scalingN(x,y,t) )*f + offset1(x,y,t) + offset2(x,y,t) + ... + offsetM(x,y,t)

I think your version is actually much better, but I wanted to be sure that's what you're doing. It doesn't make much sense to be adding scalings.

aekiss commented 3 years ago

Ah, never mind, I can see from the code that it's

f = ( scaling1(x,y,t) * scaling2(x,y,t) * ... * scalingN(x,y,t) )*f + offset1(x,y,t) + offset2(x,y,t) + ... + offsetM(x,y,t)

so that's good. https://github.com/COSIMA/libaccessom2/blob/5b6c697/libforcing/src/forcing_field.F90#L119-L198

COSIMA / libaccessom2