JuliaGeo / GRIBDatasets.jl

A high level interface to GRIB encoded files.
MIT License
12 stars 5 forks source link

Multiple latitude, longitude grids not supported. #27

Open lupemba opened 7 months ago

lupemba commented 7 months ago

I have a tried to read a .grib file that has variables defined on two different latitude on longitude grid. (link to file https://we.tl/t-varf2JJHwN ) This file has sea surface temperatures on a 0.25 degree X 0.25 degree grid and Mean wave period on a 0.5 degree X 0.5 degree grid.

This causes an error in the getone function. I am able to read the file if I simply remove this error using type piracy but this results in some warnings.

Example code

using GRIBDatasets, CairoMakie
import GRIBDatasets: getone, from_grib_date_time # type piracy to test

file_path = "multi_grid_data.grib";

# see issue https://github.com/JuliaGeo/GRIBDatasets.jl/issues/25
from_grib_date_time(date::Int32, time::Int32; epoch) = from_grib_date_time(Int(date), Int(time); epoch= epoch)

# error in getone function
ds = GRIBDataset(file_path);

# change error to warning for testing
function getone(index::FileIndex, key::AbstractString) 
    val = GRIBDatasets.getheaders(index)[key]
    if length(val) !== 1
        @warn("Expected 1 value for $key, found $(val)")
    end
    return first(val)
end

# The hack works
ds = GRIBDataset(file_path)

# plot results to illustrate
fig= let
    fig = Figure(size=(1200,600))

    lat2 = 90:-0.25:-90
    lon2 = 0:0.25:359.75
    @assert (length(lon2), length(lat2)) == size(ds["sst"])[1:2]
    ax1 = Axis(fig[1, 1], title = "SST high resolution grid")
    heatmap!(ax1, 0:0.25:359.75, 90:-0.25:-90, ds["sst"][:,:,1])

    lon = ds["lon"][:]
    lat = ds["lat"][:]
    @assert (length(lon), length(lat)) == size(ds["mwp"])[1:2]
    ax2 = Axis(fig[1, 2], title = "MWP low resolution grid")
    heatmap!(ax2, lon, lat, ds["mwp"][:,:,1] )

    fig
end

image

Proposal

Can we allow there to be multiple lat and lon dimensions so that this case would result in.

Dimensions
   lon = 720
   lat = 361
   lon2 = 1440
   lat2 = 721
   valid_time = 3

Then lon2 and lat2 should be added as variables and the dimension of "sst" should be using the second grid.

 sst   (1440 × 721 × 3)
    Datatype:    Union{Missing, Float64} (Float64)
    Dimensions:  lon2 × lat2 × valid_time
lupemba commented 7 months ago

@tcarion I have made this issue following the discussion on https://github.com/JuliaGeo/GRIBDatasets.jl/pull/26# I have used windows for the example code so it is definitely possible to read .grib files on windows with this package.

tcarion commented 7 months ago

Hi @lupemba,

Thank you very much for this thorough report! That's surprisingly not a straightforward issue to solve. I tried to open your file with the python cfgrib, and it appears to skip the variable in such case:

>>> import xarray as xr
>>> ds = xr.load_dataset("multi_grid_data.grib", engine="cfgrib")
skipping variable: paramId==34 shortName='sst'
Traceback (most recent call last):
  File "/home/tcarion/miniconda3/lib/python3.9/site-packages/cfgrib/dataset.py", line 660, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/home/tcarion/miniconda3/lib/python3.9/site-packages/cfgrib/dataset.py", line 591, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='latitude' value=Variable(dimensions=('latitude',), data=array([ 90. ,  89.5,  89. ,  88.5,  88. ,  87.5,  87. ,  86.5,  86. 

But I think the behaviour you propose is more advisable. I will try to implement it.

In the meantime, a workaround is to filter on the variables:

using GRIBDatasets

file_path = "multi_grid_data.grib"

ds_sst = GRIBDataset(file_path; filter_by_values=Dict("cfVarName" => "sst"))
ds_mwp = GRIBDataset(file_path; filter_by_values=Dict("cfVarName" => "mwp"))

By doing so, you get 2 properly built datasets.

In case you have more than 2 variables, you can also regroup them by filtering on the resolution, for example:

ds_corse = GRIBDataset(file_path; filter_by_values=Dict("Nx" => "720"))
ds_high = GRIBDataset(file_path; filter_by_values=Dict("Nx" => "1440"))