ecmwf / earthkit

Apache License 2.0
23 stars 3 forks source link

C3S Fire Burned Area Pixel product not correctly readable #36

Open gritk opened 2 months ago

gritk commented 2 months ago

What happened?

The function should also have the option of specifying arguments such as decode_cf, decode_times:

ba_pixel_data = ek.data.from_source("file", f"{LOCAL_DATA_DIR}/ba_pixel_04_2022_europe.zip", decode_cf=False, decode_times=False)

I am not able to read the data (C3S Fire Burned Area Pixel Products) correctly with this function. The image shows the content (opened in PANOPLY). With EARTHKIT the data has been interpreted as datetime[ns], but in this case it is not correct because the variable also contains values of 0,-1,-2.

Do you have any idea how to solve the problem without the arguments?

Translated with DeepL.com (free version)

grafik

Data request: c = cdsapi.Client() if DOWNLOAD_FROM_CDS: c.retrieve( 'satellite-fire-burned-area', { 'format': 'zip', 'origin': 'c3s', 'sensor': 'olci', 'version': '1_1', 'year': '2022', 'month': '04', 'nominal_day': '01', 'variable': 'pixel_variables', 'region': 'europe', }, f"{LOCAL_DATA_DIR}/ba_pixel_04_2022_europe.zip")

What are the steps to reproduce the bug?

If you are using the pre-downloaded data then please set DOWNLOAD_FROM_CDS to False

and set the LOCAL_DATA_DIR to where you stored the data.

import cdsapi DOWNLOAD_FROM_CDS = False #True

LOCAL_DATA_DIR = "../data_ba/data/"

Downloading ba-pixel product over Europe

c = cdsapi.Client() if DOWNLOAD_FROM_CDS: c.retrieve( 'satellite-fire-burned-area', { 'format': 'zip', 'origin': 'c3s', 'sensor': 'olci', 'version': '1_1', 'year': '2022', 'month': '04', 'nominal_day': '01', 'variable': 'pixel_variables', 'region': 'europe', }, f"{LOCAL_DATA_DIR}/ba_pixel_04_2022_europe.zip")

This command was used to save the data files in our managed storage,

# #  they are not required for the notebook to run, and your computer will cache the 
# #  results so you don't have to download again
ba_pixel_data = ek.data.from_source("file", f"{LOCAL_DATA_DIR}/ba_pixel_04_2022_europe.zip", decode_cf=False, decode_times=False)

else: ba_pixel_data = ek.data.from_source("file", f"{LOCAL_DATA_DIR}/ba_pixel_04_2022_europe.zip", decode_cf=False, decode_times=False) ba_pixel_data

Version

earthkit ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/earthkit'] 0.4.2 numpy ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/numpy'] 1.26.4 xarray ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/xarray'] 2024.2.0 pandas ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/pandas'] 2.2.1 geopandas ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/geopandas'] 0.14.3 matplotlib ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/matplotlib'] 3.7.1 cartopy ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/cartopy'] 0.22.0 rasterio ['/home/jovyan/.conda-libs/earthkit/lib/python3.10/site-packages/rasterio'] 1.3.9 cdsapi 0.6.1

Platform (OS and architecture)

Windows 11 Pro

Relevant log output

<xarray.Dataset> Size: 6GB
Dimensions:      (time: 1, lat: 20880, lon: 28440, bounds: 2)
Coordinates:
  * lon          (lon) float64 228kB -26.0 -26.0 -25.99 ... 52.99 53.0 53.0
  * lat          (lat) float64 167kB 83.0 83.0 82.99 82.99 ... 25.01 25.0 25.0
  * time         (time) datetime64[ns] 8B 2022-04-01
Dimensions without coordinates: bounds
Data variables:
    JD           (time, lat, lon) datetime64[ns] 5GB dask.array<chunksize=(1, 1200, 1200), meta=np.ndarray>
    CL           (time, lat, lon) int8 594MB dask.array<chunksize=(1, 1200, 1200), meta=np.ndarray>
    LC           (time, lat, lon) uint8 594MB dask.array<chunksize=(1, 1200, 1200), meta=np.ndarray>
    lon_bounds   (lon, bounds) float64 455kB dask.array<chunksize=(16384, 2), meta=np.ndarray>
    lat_bounds   (lat, bounds) float64 334kB dask.array<chunksize=(16384, 2), meta=np.ndarray>
    time_bounds  (time, bounds) datetime64[ns] 16B dask.array<chunksize=(1, 2), meta=np.ndarray>
    crs          int32 4B ...
Attributes: (12/38)
    title:                      ECMWF C3S Pixel OLCI Burned Area product
    institution:                University of Alcala
    source:                     Sentinel-3 A+B OLCI FR, MODIS MCD14ML Collect...
    history:                    Created on 2022-11-12 06:35:08
    references:                 https://climate.copernicus.eu/
    tracking_id:                cfc08ed4-b87d-4e95-bb14-960378a53ccb
    ...                         ...
    geospatial_lon_units:       degrees_east
    geospatial_lat_units:       degrees_north
    geospatial_lon_resolution:  0.00277778
    geospatial_lat_resolution:  0.00277778
    product_version:            v1.1
    id:                         20220401-C3S-L3S_FIRE-BA-OLCI-AREA_3-fv1.1.nc

Accompanying data

No response

Organisation

No response

gritk commented 2 months ago

variable:

short JD(time=1, lat=20880, lon=28440); :units = "days since 2022-01-01"; :comment = "Possible values: 0 when the pixel is not burned; 1 to 366 day of the first detection when the pixel is burned; -1 when the pixel is not observed in the month; -2 when pixel is not burnable: water bodies, bare areas, urban areas and permanent snow and ice."; :long_name = "Date of the first detection"; :_ChunkSizes = 1U, 1200U, 1200U; // uint

This workoraound seems to be working:

str_first_day_of_obs_year= '2022-01-01T00:00:00.000' data_subset_ba_pixel_2D = (data_subset_ba_pixel['JD']-np.datetime64(str_first_day_of_obs_year)).dt.days

sandorkertesz commented 2 months ago

Thank you for reporting this issue. How to do you try to process this data with earthkit? I mean after:

ba_pixel_data = ek.data.from_source("file", f"{LOCAL_DATA_DIR}/ba_pixel_04_2022_europe.zip")

what do you do with ba_pixel_data? What is data_subset_ba_pixel? I am sorry but this is not explained in you description.

gritk commented 2 months ago

import matplotlib as mpl

cmap = mpl.colors.ListedColormap([ ( 49/256., 57/256., 149/256.), # <1 # <5 ( 57/256., 94/256., 196/256.), # 1-5 # 5-10 ( 96/256., 143/256., 204/256.), # 5-10 # 10-25 (171/256., 217/256., 233/256.), # 10-25 # 25-50 (255/256., 255/256., 191/256.), # 25-50 # 50-100 (253/256., 174/256., 97/256.), # 50-100 # 100-250 (215/256., 48/256., 39/256.), # 100-200 # 250-500 (165/256., 0/256., 38/256.), # 200-300 # 500-750 (135/256., 0/256., 0/256.), # >300 # >750 ])

bounds = [-2, -1, 0, 1, 25, 50, 100, 200, 300, 365] norm = mpl.colors.BoundaryNorm(bounds, cmap.N)

Spatial subset selection

min_lon = 14 min_lat = 46 max_lon = 16 max_lat = 44

data_ba_pixel=ba_pixel_data.to_xarray(decode_cf=False, decode_times=False) data_subset_ba_pixel = data_ba_pixel.sel(lat=slice(min_lat,max_lat), lon=slice(min_lon,max_lon))

str_first_day_of_obs_year= str(data_subset_ba_pixel.coords['time'].values[0]).split('-')[-3] + '-01-01T00:00:00.000' data_subset_ba_pixel_2D = (data_subset_ba_pixel['JD']-np.datetime64(str_first_day_of_obs_year)).dt.days

Set plot title

time_str= str(data_subset_ba_pixel_2D.coords['time'].values[0]).split('-')[-3] + '/' + str(data_subset_ba_pixel_2D.coords['time'].values[0]).split('-')[-2] title = 'Fire Burned Area ' + time_str

Create the figure

ax = plt.axes(projection=ccrs.PlateCarree()) gl=ax.gridlines(draw_labels=True, linewidth=1, color='gray', alpha=0.5, linestyle='--') gl.ylabels_right = False gl.xlabels_top=False ax.coastlines() fig=data_subset_ba_pixel_2D[0].plot(cmap=cmap, norm=norm) plt.title(title)

show figure

fig

save figure

file_name_figure= "../data_ba/imgs/ba_pixel_042022_SE-Europe.png"

sandorkertesz commented 2 months ago

Thank you for sending the details.

When calling to_xarray() on the earthkit NetCDF object you can pass kwargs to xarray using the xarray_open_mfdataset_kwargs option.

data_ba_pixel=ba_pixel_data.to_xarray(xarray_open_mfdataset_kwargs=dict(decode_cf=False, decode_times=False))

Then data_ba_pixel contains JD as int16:

image