euroargodev / BlueCloud

Working space for BlueCloud demontrator 3 - Lops Task
0 stars 1 forks source link

time_slice='most_freq_label' raise an error #27

Closed quai20 closed 4 years ago

quai20 commented 4 years ago

I tested the notebook PCMtest_3Dgridded_products.ipynb with CORA dataset (/home5/pharos/REFERENCE_DATA/OCEAN_REP/CORA/CORA5.1/field/2016/*_TEMP.nc). I get this error when trying to plot the most frequent label :

P.spatial_distribution(time_slice='most_freq_label')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-177c4c40329a> in <module>
----> 1 P.spatial_distribution(time_slice='most_freq_label')
      2 #P.save_BlueCloud('figures/spatial_distr_freq_EX.png')

/export/home1/PROJECTS/BlueCloud/Plotter.py in spatial_distribution(self, proj, extent, time_slice)
    423 
    424         if isinstance(time_slice, str):
--> 425             dsp = get_most_freq_labels(self.ds)
    426             var_name = 'PCM_MOST_FREQ_LABELS'
    427             title_str = r"$\bf{"'Spatial'"}$"+' ' + r"$\bf{"'ditribution'"}$"+' ' + \

/export/home1/PROJECTS/BlueCloud/Plotter.py in get_most_freq_labels(this_ds)
    415                 return xr.DataArray(mpblab, dims='N_OBS', coords={'N_OBS': this['N_OBS']}, name='PCM_MOST_FREQ_LABELS').to_dataset()
    416             this_ds['PCM_MOST_FREQ_LABELS'] = this_ds.map_blocks(
--> 417                 fct)['PCM_MOST_FREQ_LABELS'].load()
    418             return this_ds.unstack('N_OBS')
    419 

/export/home/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py in map_blocks(self, func, args, kwargs)
   5747         from .parallel import map_blocks
   5748 
-> 5749         return map_blocks(func, self, args, kwargs)
   5750 
   5751     def pad(

/export/home/anaconda3/lib/python3.7/site-packages/xarray/core/parallel.py in map_blocks(func, obj, args, kwargs)
    318                         which_chunk = chunk_index_dict[dim]
    319                         subsetter[dim] = slice(
--> 320                             chunk_index_bounds[dim][which_chunk],
    321                             chunk_index_bounds[dim][which_chunk + 1],
    322                         )

KeyError: 'pcm_class'

My dataset has the 'pcm_class' dimension :

<xarray.Dataset>
Dimensions:      (depth: 152, latitude: 68, longitude: 89, pcm_class: 8, quantile: 3, time: 12)
Coordinates:
  * pcm_class    (pcm_class) int64 0 1 2 3 4 5 6 7
  * depth        (depth) float32 -1.0 -3.0 -5.0 ... -1960.0 -1980.0 -2000.0
  * longitude    (longitude) float64 120.5 121.0 121.5 ... 163.5 164.0 164.5
  * latitude     (latitude) float64 23.33 23.79 24.24 24.7 ... 49.09 49.41 49.74
  * time         (time) datetime64[ns] 2016-01-15 2016-02-15 ... 2016-12-15
  * quantile     (quantile) float64 0.05 0.5 0.95
Data variables:
    TEMP         (time, depth, latitude, longitude) float32 dask.array<chunksize=(1, 152, 68, 89), meta=np.ndarray>
    TEMP_ERR     (time, depth, latitude, longitude) float32 dask.array<chunksize=(1, 152, 68, 89), meta=np.ndarray>
    TEMP_PCTVAR  (time, depth, latitude, longitude) float32 dask.array<chunksize=(1, 152, 68, 89), meta=np.ndarray>
    PCM_LABELS   (time, latitude, longitude) int64 1 1 2 2 2 2 2 ... 2 2 2 2 2 2
    PCM_POST     (pcm_class, time, latitude, longitude) float64 8.424e-29 ... 1.838e-08
    TEMP_Q       (pcm_class, quantile, depth) float64 4.896 5.203 ... 52.77
Attributes:
[...]
AndreaGarciaJuan commented 4 years ago

I tried CORA dataset in mediterranean sea and it is working for me: Capture d'écran de 2020-09-22 15-27-55

Which spatial extent are you using?

AndreaGarciaJuan commented 4 years ago

I am using 0.16.0 xarray version

quai20 commented 4 years ago

It works for me if I load only 1 file from CORA. The error seems to be raised when the dataset is dask-chunked (xarray_mfopen) @gmaze : Does the function get_most_freq_labels(this_ds) is compatible with chunked dataset ?

AndreaGarciaJuan commented 4 years ago

I am using xarray.open_mfdataset:

file_path = '/home5/pharos/REFERENCE_DATA/OCEAN_REP/CORA/CORA5.1/field/2016/*_TEMP.nc'
lon_extent = [-6,35]
lat_extent = [30,45]
ds = xr.open_mfdataset(file_path,combine='by_coords', concat_dim='time')
ds = ds.sel(latitude=slice(lat_extent[0],lat_extent[1]), longitude=slice(lon_extent[0], lon_extent[1])) 
quai20 commented 4 years ago

Ok, I'll update my xarray and co to see if it comes from the env

quai20 commented 4 years ago

Ok after an update it's working.