Open sylvainmarchi opened 2 months ago
It seems to be related to the group attribute. I do not face this error when I use
group = sdba.Grouper("time.dayofyear", window=31)
Hi!
The reference and historical datasets are made of a single chunk of size 7305 for the time coordinate.
Are you sure? That doesn't seem to be the case in the code snippet you shared, for your historical dataset you have chunksize != size of time
, 1827 != 7305
.
<xarray.DataArray 'multivariate' (multivar: 2, time: 7305 , lat: 25, lon: 41)> Size: 60MB dask.array<rechunk-merge, shape=(2, 7305, 25, 41), dtype=float32, chunksize=(2, 1827, 25, 41), chunktype=numpy.ndarray>
Try running
group = 'time'
ADJ = sdba.MBCn.train(
dref.chunk(dict(time=-1)),
dhist.chunk(dict(time=-1)),
base_kws={"nquantiles": 20, "group": group},
adj_kws={"interp": "nearest", "extrapolation": "constant"},
n_iter=20, # perform 20 iteration
n_escore=1000, # only send 1000 points to the escore metric
)
Cheers, Éric
Hello Éric, My bad. It now works. Unfortunately, I am stuck at the next step. Maybe I should create a new "issue" or discussion. When executing the next piece of code
scenh, scens = (
ADJ.adjust(
sim=ds,
ref=dref,
hist=dhist,
base=sdba.QuantileDeltaMapping,
base_kws_vars={
"pr": {
"kind": "*",
"jitter_under_thresh_value": "0.01 mm d-1", # Trick to remove zeros, as they can be problematic for bias-adusting (as far as I understood)
"adapt_freq_thresh": "0.1 mm d-1", # Value used to biascorrect the frequency of dry days, that is days with almost no precipitation.
},
"tas": {"kind": "+"},
},
adj_kws={"interp": "nearest", "extrapolation": "constant"},
)
for ds in (dhist, dsim)
)
I get: ValueError: The dimension(s) over which we group, reduce or interpolate cannot be chunked ({'time': (7305, 7305)}).
Below is how my inputs look like. dref
<xarray.DataArray 'multivariate' (multivar: 2, time: 7305, lat: 25, lon: 41)> Size: 120MB
dask.array<rechunk-merge, shape=(2, 7305, 25, 41), dtype=float64, chunksize=(2, 7305, 25, 41), chunktype=numpy.ndarray>
Coordinates:
* time (time) object 58kB 1986-01-01 00:00:00 ... 2005-12-31 00:00:00
* multivar (multivar) <U3 24B 'pr' 'tas'
Dimensions without coordinates: lat, lon
Attributes:
CDI: Climate Data Interface version 1.9.5 (http://mpimet.mpg.d...
Conventions: CF-1.6
history: Thu Jul 11 11:29:46 2024: cdo remapcon,grid_description_0...
creation_date: 25-06-2024
creators: Ghilain N., Van Schaeybroeck B., Vanderkelen I.
contact: inne.vanderkelen@meteo.be
version: 1.1
affiliation: Royal Meteorological Institute of Belgium
projection: +proj=lcc +lat_2=50.569898649999999 +lat_1=50.56989864999...
CDO: Climate Data Operators version 1.9.5 (http://mpimet.mpg.d...
units:
dhist
<xarray.DataArray 'multivariate' (multivar: 2, time: 7305, lat: 25, lon: 41)> Size: 60MB
dask.array<rechunk-merge, shape=(2, 7305, 25, 41), dtype=float32, chunksize=(2, 7305, 25, 41), chunktype=numpy.ndarray>
Coordinates:
* time (time) object 58kB 1986-01-01 12:00:00 ... 2005-12-31 12:00:00
height float64 8B ...
* multivar (multivar) <U3 24B 'pr' 'tas'
Dimensions without coordinates: lat, lon
Attributes: (12/32)
CDI: Climate Data Interface version 1.9.6 (htt...
history: Tue May 19 20:33:10 2020: cdo -z szip rem...
source: CLMcom-CCLM4-8-17
institution: Climate Limited-area Modelling Community ...
Conventions: CF-1.4
institute_id: CLMcom
... ...
table_id: Table day (Sept 2013) 0cf1782745489246c9f...
modeling_realm: atmos
realization: 1
cmor_version: 2.9.1
CDO: Climate Data Operators version 1.9.6 (htt...
units:
dsim
<xarray.DataArray 'multivariate' (multivar: 2, time: 7305, lat: 25, lon: 41)> Size: 60MB
dask.array<rechunk-merge, shape=(2, 7305, 25, 41), dtype=float32, chunksize=(2, 7305, 25, 41), chunktype=numpy.ndarray>
Coordinates:
* time (time) object 58kB 2036-01-01 12:00:00 ... 2055-12-31 12:00:00
height float64 8B ...
* multivar (multivar) <U3 24B 'pr' 'tas'
Dimensions without coordinates: lat, lon
Attributes: (12/32)
CDI: Climate Data Interface version 1.9.6 (htt...
history: Tue May 19 21:32:31 2020: cdo -z szip rem...
source: CLMcom-CCLM4-8-17
institution: Climate Limited-area Modelling Community ...
Conventions: CF-1.4
institute_id: CLMcom
... ...
table_id: Table day (Sept 2013) 0cf1782745489246c9f...
modeling_realm: atmos
realization: 1
cmor_version: 2.9.1
CDO: Climate Data Operators version 1.9.6 (htt...
units:
ADJ.ds
<xarray.Dataset> Size: 3MB
Dimensions: (multivar_prime: 2, win_dim0: 1, lat: 25, lon: 41,
iterations: 20, quantiles: 20, multivar: 2)
Coordinates:
height float64 8B 2.0
* multivar_prime (multivar_prime) <U3 24B 'pr' 'tas'
* quantiles (quantiles) float64 160B 0.025 0.075 0.125 ... 0.925 0.975
* win_dim0 (win_dim0) int64 8B -1
* multivar (multivar) <U3 24B 'pr' 'tas'
Dimensions without coordinates: lat, lon, iterations
Data variables:
af_q (win_dim0, lat, lon, iterations, multivar_prime, quantiles) float32 3MB dask.array<chunksize=(1, 25, 41, 20, 2, 20), meta=np.ndarray>
escores (win_dim0, lat, lon, iterations) float32 82kB dask.array<chunksize=(1, 25, 41, 20), meta=np.ndarray>
rot_matrices (iterations, multivar, multivar_prime) float32 320B 0.995...
Attributes:
_xclim_adjustment: {"py/object": "xclim.sdba.adjustment.MBCn", "py/state...
adj_params: MBCn(quantiles=array([0.025, 0.075, 0.125, 0.175, 0.2...
Is it OK to use proleptic gregorian calendar and 2 dimensions for space?
No worries!
Space dimensions are not an issue. I will look into it, but if you're using group = "time.dayofyear"
, you will need a regular calendar yes. I think this is assumed for other sdba
methods as well. For group="time"
this should not be a constraint.
Can you try:
scenh, scens = (
ADJ.adjust(
sim=ds.convert_calendar("noleap"),
ref=dref.convert_calendar("noleap"),
hist=dhist.convert_calendar("noleap"),
base=sdba.QuantileDeltaMapping,
base_kws_vars={
"pr": {
"kind": "*",
"jitter_under_thresh_value": "0.01 mm d-1", # Trick to remove zeros, as they can be problematic for bias-adusting (as far as I understood)
"adapt_freq_thresh": "0.1 mm d-1", # Value used to biascorrect the frequency of dry days, that is days with almost no precipitation.
},
"tas": {"kind": "+"},
},
adj_kws={"interp": "nearest", "extrapolation": "constant"},
)
for ds in (dhist, dsim)
)
and see if this removes the error messages? In the meantime I will investigate.
Can you share more information on the error? If you could share 3 .nc files for dref, dhist,dsim
with a single lon/lat point it would be helpful so I can troubleshoot.
I still have the same error message after converting calendars. Please find my netCDF files as well as my python script at (.py and .nc files are not supported): https://cloud.meteo.be/s/8c9KMpcwxGb4gE2
The python script in your folder is for the old way of working things with NpdfTransform
? Do you have the script you used for this above?
I have updated the file.
Thanks! The problem is that you have different time coordinates:
dref.time
>>> '1986-01-01T00:00:00.000000000', '1986-01-02T00:00:00.000000000',
'1986-01-03T00:00:00.000000000', ...,
dhist.time
>>> '1986-01-01T12:00:00.000000000', '1986-01-02T12:00:00.000000000',
'1986-01-03T12:00:00.000000000', ...,
for instance if you try sdba.QuantileDeltaMapping.train(ref=dref, hist=dhist)
you will get the same error. If you change the time in dhist
:
dhist["time"] = dref.time
the error disappears.
I think we could control this earlier in the process since our methods expect equal time coordinates and give a clearer error message. Thanks for this feedback.
Let me know if there are other problems after that.
For xclim devs: We could have a similar function to _harmonize_units
which simply checks if times match? This would look more like _check_inputs
, but it could not really fit in _check_inputs
since the input list is fairly general, inputs
could be [ref
, hist
] which would work for our needs of matching times, but it can also be [ref
, hist
, sim
] or [sim
].
Generic Issue
Xclim version: 0.52.0 Python version: 3.10.14 Operating System: Linux
Description
I apply the MBCn algorithm for bias-correcting Euro CORDEX data. I successfully reproduced the example presented in the notebook. Now I follow the same procedure using my data.
What I Did
I loaded my files using open_mfdataset and merged them (one file per variable) to create dref and dhist. The reference and historical datasets are made of a single chunk of size 7305 for the time coordinate. This is my reference file:
This is my historical file:
What I Received
I got the following error message when I executed:
File /.../conda/envs/xclim_metpy/lib/python3.10/site-packages/xarray/core/computation.py:767, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args) 765 for axis, dim in enumerate(core_dims, start=-len(core_dims)): 766 if len(data.chunks[axis]) != 1: --> 767 raise ValueError( 768 f"dimension {dim} on {n}th function argument to " 769 "apply_ufunc with dask='parallelized' consists of " 770 "multiple chunks, but is also a core dimension. To " 771 "fix, either rechunk into a single array chunk along " 772 f"this dimension, i.e.,
.chunk(dict({dim}=-1))
, or " 773 "passallow_rechunk=True
indask_gufunc_kwargs
" 774 "but beware that this may significantly increase memory usage." 775 ) 776 dask_gufunc_kwargs["allow_rechunk"] = True 778 output_sizes = dask_gufunc_kwargs.pop("output_sizes", {}) ValueError: dimension time on 1th function argument to apply_ufunc with dask='parallelized' consists of multiple chunks, but is also a core dimension. To fix, either rechunk into a single array chunk along this dimension, i.e.,.chunk(dict(time=-1))
, or passallow_rechunk=True
indask_gufunc_kwargs
but beware that this may significantly increase memory usage.What is going wrong here?
Code of Conduct