JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
99 stars 16 forks source link

Reading NetCDF throws error #296

Closed alex-s-gardner closed 1 year ago

alex-s-gardner commented 1 year ago

Following the example in the docs for reading a netcdf throws error for YAXArrays but not NCDatasets or NetCDF:

using Downloads
import YAXArrays as YAXA
using NetCDF
path2file = "https://its-live-data.s3.amazonaws.com/Test/N52W175.nc"
download(path2file, "N52W175.nc")
YAXA.Cube("N52W175.nc")

throws this error at a place that in the code that is flagged a being a bit hackey:

ERROR: ArgumentError: Could not promote element types of cubes in dataset to a common concrete type, because of Variable uncert
Stacktrace:
 [1] Cube(ds::YAXArrays.Datasets.Dataset; joinname::String, target_type::Nothing)
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/lvvMa/src/DatasetAPI/Datasets.jl:356
 [2] Cube
   @ ~/.julia/packages/YAXArrays/lvvMa/src/DatasetAPI/Datasets.jl:341 [inlined]
 [3] #Cube#117
   @ ~/.julia/packages/YAXArrays/lvvMa/src/DatasetAPI/Datasets.jl:815 [inlined]
 [4] Cube(s::String)
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/lvvMa/src/DatasetAPI/Datasets.jl:815
 [5] top-level scope
   @ ~/Documents/GitHub/Altim/src/hugonnet.jl:73

Check reading a single variable using NetCDF

ncread("N52W175.nc", "ref_z")
723×1141 Matrix{Float32}:
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  …  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
   ⋮                        ⋮                        ⋮                 ⋱                   ⋮                        ⋮                        ⋮
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Check using NCDatasets

using NCDatasets
NCDataset("N52W175.nc")

Dataset: N52W175.nc
Group: /

Dimensions
   x = 723
   y = 1141
   time = 241

Variables
  time   (241)
    Datatype:    Dates.DateTime (Float32)
    Dimensions:  time
    Attributes:
     units                = days since 2000-01-01
     calendar             = standard
     standard_name        = date

  x   (723)
    Datatype:    Float32 (Float32)
    Dimensions:  x
    Attributes:
     units                = m
     standard_name        = projection_x_coordinate
     axis                 = X

  y   (1141)
    Datatype:    Float32 (Float32)
    Dimensions:  y
    Attributes:
     units                = m
     standard_name        = projection_y_coordinate
     axis                 = Y

  crs  
    Attributes:
     long_name            = WGS 84 UTM zone 1N
     grid_mapping_name    = transverse_mercator
     units                = metre
     spheroid             = WGS84
     semi_major_axis      = 6378137.0
     inverse_flattening   = 298.257223563
     datum                = WGS_1984
     longitude_of_prime_meridian = 0
     spatial_ref          = PROJCS["WGS 84 / UTM zone 1N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-177],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32601"]]
     latitude_of_origin   = 0.0
     central_meridian     = -177.0
     proj_scale_factor    = 0.9996
     false_easting        = 500000.0
     false_northing       = 0.0

  dem_names   (241)
    Datatype:    String (String)
    Dimensions:  time
    Attributes:
     long_name            = Source DEM Filename

  z   (723 × 1141 × 241)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  x × y × time
    Attributes:
     _FillValue           = -9999.0
     units                = meters
     long_name            = Height above WGS84 ellipsoid
     grid_mapping         = crs
     coordinates          = x y

  uncert   (241)
    Datatype:    Float32 (Float32)
    Dimensions:  time
    Attributes:
     long_name            = RMSE of stable terrain differences.
     units                = meters

  ref_z   (723 × 1141)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  x × y
    Attributes:
     _FillValue           = -9999.0
     units                = meters
     long_name            = Height above WGS84 ellipsoid
     grid_mapping         = crs
     coordinates          = x y

  corr   (723 × 1141 × 241)
    Datatype:    Union{Missing, Int8} (Int8)
    Dimensions:  x × y × time
    Attributes:
     _FillValue           = -1
     units                = percent
     long_name            = MMASTER correlation
     grid_mapping         = crs
     coordinates          = x y

Global attributes
  Conventions          = CF-1.7
  description          = Stack of co-registered DEMs produced using MMASTER (+ other sources). 
MMASTER scripts and documentation: https://github.com/luc-girod/MMASTER-workflows 
pybob source and documentation: https://github.com/iamdonovan/pybob
  history              = Created Sun Jan 12 09:44:17 2020
  source               = Robert McNabb (robertmcnabb@gmail.com)
felixcremer commented 1 year ago

You should be able to open it with open_dataset as a dataset which is a collection of cubes. We should add this to the docs and give a hint in the error.

alex-s-gardner commented 1 year ago

Ooopps... I think got guidance on this same issue last year. Guidance in the documentation would be greatly appreciated. I can add this in a PR if you tell me where you would like it added.

felixcremer commented 1 year ago

I think it would be good to have examples for a cube and open_dataset in the "Open NetCDF" and "Open Zarr" pages of the docs and then we would also need an explanation page for the differences between datasets and Cubes but this is something I could write after my vacation..

alex-s-gardner commented 1 year ago

why not overload cube to handle both cases?

felixcremer commented 1 year ago

The Cube function is using the open_dataset function to load a netcdf file and then concatenates all cubes with the most dimensions into a single data cube with an additional Variable dimension.

I don't understand how we could overload cube to handle both cases.

alex-s-gardner commented 1 year ago

Playing around a bit with YAXArray I'm really excited by what you've done and I also I think there is room for improving the API. It seems that most users should simply interact with YAXArray using datasets. Cubes would only be for expert users... netcdfs and zarrs are nearly or always going to be datasets and this is how the user is going to expect to interact with the data. 80% of users will be read only. High level functionality similar to NCDatasets would make YAXarray more intuitive to use.

For example:

Maybe an exported function Dataset instead of open_dataset would be better and if Dataset was used to read a Cube it could return a Dataset. This would be a bit more intuitive

and something like: open_dataset(zopen(store, consolidated=true)) could be Dataset(pat2file; kwg)

I'm still wrapping my head around the API, but I find the best time to give feedback is the first time you're working with the package as this is were new users will come from.

alex-s-gardner commented 1 year ago

Some more notes as I work with the package: [1] Time is displayed in the Repl as:

  Dim{:x} Sampled{Float32} 633950.0f0:100.0f0:706150.0f0 ForwardOrdered Regular Points,
  Dim{:y} Sampled{Float32} 5.87665f6:-100.0f0:5.76265f6 ReverseOrdered Regular Points,
  Ti Sampled{DateTime} DateTime[2000-07-28T00:00:00, …, 2019-08-18T00:00:00] ForwardOrdered Irregular Points

maybe Dim{:time} would be more clear?

[2] a function save could replace savedataset & savecube. If save is passed a Cube it would call savecube, if it was passed Dataset it could call savedataset

save would also check for a file name extension (e.g. .zarr or .nc) to determine which driver to use... or the user could pass driver as a kwarg.

lazarusA commented 1 year ago

@alex-s-gardner could you provide a smaller file so that we can include this case directly into the docs already. Now is ~200MB which is not good for CI, having something smaller would help a lot. Thanks.

alex-s-gardner commented 1 year ago

@lazarusA what about one of the files hosted here: https://www.unidata.ucar.edu/software/netcdf/examples/files.html

maybe this one: sresa1b_ncar_ccsm3-example.nc

which is 2.8 MB

alex-s-gardner commented 1 year ago

@lazarusA look like you added an example thanks... closing this issue and starting new ones for some of the random comments I added to the thread.