JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
103 stars 18 forks source link

Adding a Cube to existing Cube #246

Closed Balinus closed 1 year ago

Balinus commented 1 year ago

Hello!

I'm am trying the following and I have still not succeeeding at adding a CubeB resulting from a computation to an existing CubeA.

The result CubeB is the following. It represents a transformation of one of the Variable in CubeA

YAXArray with the following dimensions
time                Axis with 721 Elements from 2019-05-01T00:00:00 to 2019-10-28T00:00:00
longitude           Axis with 101 Elements from -80.0 to -55.0
latitude            Axis with 101 Elements from 65.0 to 40.0
number              Axis with 50 Elements from 1 to 50
Total size: 2.74 GB

The original Cube has all the Variables and hence has an additionnal dimension named Variable:

YAXArray with the following dimensions
longitude           Axis with 101 Elements from -80.0 to -55.0
latitude            Axis with 101 Elements from 65.0 to 40.0
number              Axis with 50 Elements from 1 to 50
time                Axis with 721 Elements from 2019-05-01T00:00:00 to 2019-10-28T00:00:00
Variable            Axis with 9 elements: fg10 t2m tp u10 v10 cp mx2t24 mn2t24 mean2t24 
units: K
Total size: 24.66 GB

I tried concatenatecubes, but I wasn't able to use the function.

I also tried to create a kind of dummy dimension Variable for CubeB, not since the data is 4D, it does not work with a length 5 axis names.

Balinus commented 1 year ago

I had a success by using one of the CubeA variable, by setting the chunks and permuting the dims as described by some Issues here.

CubeB = permutedims(CubeB, YAXArrays.Axes.findAxis.(caxes(CubeA[Variable="t2m"]), (CubeB,)))

dsfinal = concatenatecubes([setchunks(CubeA[Variable="t2m"],Dict("longitude"=>5, "latitude"=>10, "time"=>2, "number"=>1)), setchunks(CubeB, Dict("longitude"=>5, "latitude"=>10, "time"=>2, "number"=>1))], CategoricalAxis("Variables", ["Cube1", "Cube2"]))

But... this seems rather "heavy" to do if I want to rebuild a 10+ Variables from CubeA.

Balinus commented 1 year ago

ok! I had some success here. Not pretty, but it works! By using a list comprehension, concatenatecubes is able to merge all the cubes into a final Cube.

CubeB = permutedims(CubeB, YAXArrays.Axes.findAxis.(caxes(CubeA[Variable="t2m"]), (CubeB,)))

dsfinal = concatenatecubes(vcat([setchunks(CubeA[Variable=ivar], 
                                       Dict("longitude"=>5, "latitude"=>10, "time"=>2, "number"=>1)) for ivar in CubeA.Variable], 
                                       setchunks(tp_mm, Dict("longitude"=>5, "latitude"=>10, "time"=>2, "number"=>1))), 
                                       CategoricalAxis("Variables", vcat(CubeA.Variable, "tp_mm")))
Balinus commented 1 year ago

Also just found out https://github.com/JuliaDataCubes/YAXArrays.jl/issues/65

lazarusA commented 1 year ago

why not just use append ? Maybe with a MWE we could find a good solution to your situation.

Balinus commented 1 year ago

Thanks! Can append be used without saving the data to disk (explicitly. I understand that there is some saving in the background for a ~25GB dataset).

I'll post a MWE tomorrow. Cheers!

Balinus commented 1 year ago

MWE for concatenating cubes

In this MWE, I actually wasn't able to concatenate the Cube as opposed to my real world data, because of a type problem. Here's my attempts!

using YAXArrays
using Dates

function diff(xout, xin)        
    # Padding 1#st element with 0
    xout .=  vcat(0.0, Base.diff(xin))       
end

function diff(cube::YAXArray, kwargs...)    
    indims = InDims("time")    
    outdims = OutDims("time")    
    mapCube(diff, cube, indims=indims, outdims=outdims)
end

axlist = [
    RangeAxis("time", Date("2022-01-01"):Day(1):Date("2022-01-30")),
    RangeAxis("lon", range(1, 10, length=10)),
    RangeAxis("lat", range(1, 5, length=15)),
    CategoricalAxis("Variable", ["var1", "var2"])
    ]

data = rand(30, 10, 15, 2)
CubeA = YAXArray(axlist, data)

CubeA

YAXArray with the following dimensions
time                Axis with 30 Elements from 2022-01-01 to 2022-01-30
lon                 Axis with 10 Elements from 1.0 to 10.0
lat                 Axis with 15 Elements from 1.0 to 5.0
Variable            Axis with 2 elements: var1 var2 
Total size: 70.31 KB
# WE need to calculate the difference for a specific variable
CubeB = diff(CubeA[Variable="var1"])

CubeB

YAXArray with the following dimensions
time                Axis with 30 Elements from 2022-01-01 to 2022-01-30
lon                 Axis with 10 Elements from 1.0 to 10.0
lat                 Axis with 15 Elements from 1.0 to 5.0
Total size: 35.16 KB

Concatenating the cubes

# 1st try by hardcoding the variable to concatenate
dsfinal = concatenatecubes([CubeA[Variable="var1"], CubeB], CategoricalAxis("Variables", vcat("var1", "var3")))
All cubes must have the same element type, cube number 2 does not match

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] concatenatecubes(cl::Vector{YAXArray{T, 3, A, Vector{CubeAxis}} where {T, A<:AbstractArray{T, 3}}}, cataxis::CategoricalAxis{String, :Variables, Vector{String}})
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/au5n4/src/Cubes/TransformedCubes.jl:40
 [3] top-level scope
   @ In[28]:1
 [4] eval
   @ ./boot.jl:373 [inlined]
 [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

Type problem

typeof(CubeA[Variable="var1"][1:10,10,10])
Vector{Float64} (alias for Array{Float64, 1})

typeof(CubeB[1:10,10,10])
Vector{Union{Missing, Float64}} (alias for Array{Union{Missing, Float64}, 1})

Hope this helps a little it!

Balinus commented 1 year ago

I'm closing it as I am able to concatenate the cubes, although with a long syntax (it is now hidden in a function 😀 )