Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.
https://xclim.readthedocs.io/en/stable/
Apache License 2.0
323 stars 57 forks source link

Time-step difference in hist, ref dataset sdba.EmpiricalQuantileMapping.train #1427

Open floriandb opened 1 year ago

floriandb commented 1 year ago

Setup Information

Context

Hi,

I am starting to use xclim to correct Euro-Cordex forcings (hist [temperature, precip]) with a spanish reanalysis (ref) using quantile mapping.

I have a question regarding time step discrepancies. My hist dataset has 3-hr time-step while my ref has a daily time-step. I figured out that the following line runs without error but it's not clear how it is managing the different time-step.

QM = sdba.EmpiricalQuantileMapping.train( ref, hist, nquantiles=15, group="time", kind="+" )

Otherwise, I thought of resampling both dataset to monthly time-step before calculating the quantile correction. That would save time and some calendar conversion. Then, I forsee some fiddling to apply the correction on the 3-hr dataset. Or is there a tool helping with this approach in xclim?

Thanks in advance for any help or for pointing me at the right part of the doc that I might have missed. Best regards,

César Deschamps-Berger

Code of Conduct

aulemahal commented 1 year ago

Bonjour César,

The fact that xclim works with these mismatched time steps is a "feature" of xarray where arrays are automatically broadcasted together if they have common dimensions (time, lat, lon) but different coordinate values (the time steps). By default, the broadcasting will use the union of coordinate values. This will inflate your daily data to a 3h timestep by injecting NaNs. The good thing is that xarray and xclim will ignore NaNs when computing the quantiles, so it's almost seemless. As long as you are using group='time', it will « work ».

However, I don't think it is a good idea to do that. You will have much more variability in the 3-hourly data and different extremes : the daily temperature cycle is not insignificant compared to the annual one. IMO, this applies awkwardly to the statistical principles behind EQM.

I would suggest a simple resampling of hist and sim to a daily timestep. I don' have fancy tools to suggest here, xarray can do it easily:

hist_tas_D = hist_tas_3H.resample(time='D', keep_attrs=True).mean()

Of course the resampling operation (here mean()) could change depending on the variable: max for tasmax, min for tasmin and sum for pr if you are using mm units (however, I would strongly recommend treating pr as a flux in kg m-2 s-1).

A monthly resampling would use the same code with time='MS'. It depends on what you need to do, but I'm not sure going to a monthly step is worth it. You'll lose variability and it seems worse to me than to drop one day every 4 years (consequence of converting the calendar). Hopefully, xclim performs well enough even with a daily timestep.

Hope that helps!

floriandb commented 1 year ago

Thanks a lot. :)

Here is what I came up with based on your answer. I first calculate and apply the QM correction at daily time step and then propagate it to 3H time step.

hist_pr_D = hist_pr_3H.resample(time='D').mean(keep_attrs=True) # 3H to daily

# QM correction training on homogeneous daily data
QM = sdba.EmpiricalQuantileMapping.train(
    ref_pr_D, hist_pr_D, nquantiles=20, group="time.month", kind="*"
    ) 

# Applies QM to daily
hist_pr_D_adj = QM.adjust(hist_pr_D, interp="linear")

# Calculates the daily correction factor
adj_factor_D = hist_pr_D_adj/hist_pr_D

# Converts to 3H correction factor by keeping the same factor for each day
adj_factor_3H  = adj_factor_D.resample(time='3H',closed="right").ffill().interp(time=hist_pr_3H.time).ffill(dim="time")

# Applies the 3H correction factor
hist_pr_3H_adj = hist_pr_3H*adj_factor_3H

I would do the same with temperature using "+" correction.

aulemahal commented 1 year ago

Oh! You want to correct the 3-hourly series!

Well, that method might work indeed. In theory, QM adjustment factors are specific to a group and a quantile. Using the same factor for the whole day won't correct the diurnal cycle, but you don't have this information anyway. I have never seen the method you suggest here, so I leave to you the scientific argumentation ;).

I believe the D->3H line could be written shorter as:

adj_factor_3H  = adj_factor_D.reindex(time=hist_pr_3H.time).ffill(dim="time")

Xclim's sdba has not yet been used by us for processing sub-daily data. It certainly lacks features that could help, and we are open to contributions !

floriandb commented 1 year ago

Hum, I did not expect to push the boundaries of the theory of quantile-mapping. Let see if does correct my simulations. Eventually I compare simulated snow cover with satellite products.

I have to keep the reindex-ffill line as is. Otherwise the first day turns into nan as hist_pr_3H first time-step is at 3 am and not midnight...

aulemahal commented 1 year ago

Haha. Maybe I'm overthinking it, but in my comprehension, Quantile Mapping's idea is : "The 68th percentile of temperature is 2°C warmer in the simulation than in the reference, we will decrease all temperatures values in this percentile by 2°C."

What I understand of your idea, is that you add the following : "Because this day had a mean temperature in the 68th percentile, all temperatures of this day will be decreased by 2°C as well."

I can imagine a extreme example where this could be an issue:

Let's say that in the hist, Day 1 has a mean temperature of 20°C and Day 2, 22°C. Let's say that when comparing with ref, we see that 20°C is the 45th percentile and should be corrected with -0.5°C. And that 22°C is the 50th percentile and is corrected with + 0.5°C. The scenarion thus has 19.5°C on day 1 and 22.5°C on day 2. While the difference is greater, it's still reasonable. A inter-day temperature change of 3° (instead of 2°) is credible.

What happens if we propagate the corrected values to the hourly timesteps ? Let's say we had this timeseries for the night:

day :  1,  1,  2,  2
hour: 22, 23,  0,  1
temp: 14, 13, 12, 13

Thus in the corrected:

day :    1,    1,    2,    2
hour:   22,   23,    0,    1
temp: 14.5, 13.5, 11.5, 12.5

The temperature difference at midnight when from 1° to 2°. What I see here, is that you are breaking the day-night cycle of your data. Within a day, the cycle is preserved, but at the junction of days you are introducing new "bias". Of course, the daily QM is also not preserving the exact day-to-day variations, as my example showed. But in the daily example, ALL steps are affected, not one every 8 steps. Also, in the daily case, it is hoped that the "breaks" are small, because two adjacent days should usually map to similar percentiles and that the transfer function should be smooth enough (my case is a bit extreme).

If you have any downstream application that depend on the derivative of your corrected variables, this could be an issue...

I am interesting in seeing what results you get with the technique, please keep us updated if possible!