JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
89 stars 14 forks source link

Converting calendar of an Array or Dataset? #357

Closed Balinus closed 6 months ago

Balinus commented 6 months ago

Hello!

I'd like to convert a given dataset from a "regular" calendar to a DateTimeNoLeap calendar (and dropping all 29th Feb data along the way).

Is there a way to do it? I tried to find a solution, but they all involved loading the data in-memory, checking datetime vector, dropping the indexes associated with Feb 29th and then rebuilding a Dataset. I'd like it to be lazy because most of the datasets are in the 20GB range.

Thanks for any pointer!

felixcremer commented 6 months ago

You can use the Not selector with a vector of the leap days either with an inner At or Near selector.

julia> winter = r1[Ti=Not(At([DateTime(1995, 4,16),DateTime(1995,10,16,12)]))]
julia> typeof(winter.data)
DiskArrays.SubDiskArray{Union{Missing, Float32}, 3}

You have to be careful to match the Date or DateTime type with the type of your time axis. This subsetting will give you a lazy SubDiskArray which handles the subsetting via the metadata without having to load the data to memory.

Balinus commented 6 months ago

Thanks! I was about to write a potentiel solution that seems to work no matter the type of Date/DateTime

ref_tasmax[Time=Dates.monthday.(lookup(ref_tasmax, :Ti)) .!= [(2,29)]]

Seems to be lazy and worked with dataset with regular Datetime and DateTimeNoLeap calendars. Then, I will need to replace the axis with a new type (e.g. from DateTime to DateTimeNoLeap). I guess using the rebuild function?

Does this code make sense?

EDIT Here's what I got, seems to work, not sure about lazyness, but it seems right when I look at @time ouputs.


using YAXArrays
using CFTime

function drop29thfeb(ds)

    ds_subset = ds[Time=Dates.monthday.(lookup(ds, :Ti)) .!= [(2,29)]]

    date_vec = lookup(ds_subset, :Ti)
    # New time vector
    datevec_noleap = CFTime.reinterpret.(DateTimeNoLeap, date_vec)

    axlist = (    
        Dim{:lon}(lookup(ds_subset,:lon)),
        Dim{:lat}(lookup(ds_subset,:lat)),  
        Dim{:Time}(datevec_noleap),
        )

    return YAXArray(axlist, ds_subset.data)

end
felixcremer commented 6 months ago

That looks good. You could replace your subsetting with a Where selector like this:

ds[Ti=Where(x-> Dates.monthday(x) != (2,29))]

but I am not sure whether there is much of a difference. I think there might also a way to only rebuild the time axis and leave the other axes as is. But this is some DimensionalData function that I would also need to lookup. You might want to not rebuild the lon and lat axis like that but use `dims(ds_subset, :lon) instead this way you get the whole axis with all of the correct reversings and so on.

Could you add this example to the docs. That would be really nice.

Balinus commented 6 months ago

That looks good. You could replace your subsetting with a Where selector like this:

ds[Ti=Where(x-> Dates.monthday(x) != (2,29))]

but I am not sure whether there is much of a difference. I think there might also a way to only rebuild the time axis and leave the other axes as is. But this is some DimensionalData function that I would also need to lookup. You might want to not rebuild the lon and lat axis like that but use `dims(ds_subset, :lon) instead this way you get the whole axis with all of the correct reversings and so on.

Could you add this example to the docs. That would be really nice.

Thanks! Yes, I will include it. Somewhere specific in the documentation?

Balinus commented 6 months ago

Also, I'm unsure how the "Time" dimension (Ti?) is handled. It seems to be a "special" dimensions and when I rebuild the time dimension now behaves as a regular dimension. Sometimes, the time dimension is accessible with Ti, sometimes not.

For example:

using Zarr, YAXArrays, Dates, DimensionalData

store = "gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/"
g = open_dataset(zopen(store, consolidated=true))

YAXArray Dataset
Shared Axes: 
()
Variables: 
height, 
tas
 with dimensions: 
  Dim{:lon} Sampled{Float64} 0.0:0.9375:359.0625 ForwardOrdered Regular Points,
  Dim{:lat} Sampled{Float64} Float64[-89.28422753251364, -88.35700351866494, …, 88.35700351866494, 89.28422753251364] ForwardOrdered Irregular Points,
  Ti Sampled{DateTime} DateTime[2015-01-01T03:00:00, …, 2101-01-01T00:00:00] ForwardOrdered Irregular Points
Properties: Dict{String, Any}("initialization_index" => 1, "realm" => "atmos", "variable_id" => "tas", "external_variables" => "areacella", "branch_time_in_child" => 60265.0, "data_specs_version" => "01.00.30", "history" => "2019-07-21T06:26:13Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards.", "forcing_index" => 1, "parent_variant_label" => "r1i1p1f1", "table_id" => "3hr"…)

The printed dataset suggest using Ti. But:

g.Ti
ArgumentError: Ti not found in Dataset

Stacktrace:
 [1] getindex(x::Dataset, i::Symbol)
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/no0he/src/DatasetAPI/Datasets.jl:144
 [2] getproperty(x::Dataset, k::Symbol)
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/no0he/src/DatasetAPI/Datasets.jl:141
 [3] top-level scope
   @ In[6]:1

while using time works:

g.time
Ti DateTime[2015-01-01T03:00:00, …, 2101-01-01T00:00:00]

However, using the lookup method, the inverse is found:

lookup(Cube(g), :Ti)
Sampled{DateTime} ForwardOrdered Irregular DimensionalData.Dimensions.LookupArrays.Points
wrapping: 251288-element Vector{DateTime}:
 2015-01-01T03:00:00
 2015-01-01T06:00:00
 2015-01-01T09:00:00
 2015-01-01T12:00:00
 ⋮ 
 2100-12-31T15:00:00
 2100-12-31T18:00:00
 2100-12-31T21:00:00
 2101-01-01T00:00:00

lookup(Cube(g), :time)
MethodError: no method matching lookup(::Nothing)

Closest candidates are:
  lookup(::Union{Val{<:DimensionalData.Dimensions.Dimension}, Type{<:DimensionalData.Dimensions.Dimension}})
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:176
  lookup(::DimensionalData.Dimensions.AnonDim)
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:383
  lookup(::DimensionalData.Dimensions.Dimension{<:AbstractArray})
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:175
  ...

Stacktrace:
 [1] lookup(ds::Tuple{Dim{:lon, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, I::Symbol)
   @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:233
 [2] lookup(A::YAXArray{Float32, 3, ZArray{Float32, 3, Zarr.BloscCompressor, GCStore}, Tuple{Dim{:lon, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}}, args::Symbol)
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/array/array.jl:62
 [3] top-level scope
   @ In[16]:1