JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
103 stars 18 forks source link

Indexing does not work for Disk-based stores #416

Open ConnectedSystems opened 3 months ago

ConnectedSystems commented 3 months ago

I'm running some code on two separate computers and have experienced an issue where after an update, most approaches to indexing stopped working on one computer, but not the other.

I've narrowed it down to YAXArray datasets that are disk-based. Just to reiterate, it is only not working on one computer.

Thing is, if I run status YAXArrays on both machines, the version numbers are the same: v0.5.8

# ]add NetCDF YAXArrays

using NetCDF
using YAXArrays

axlist = (
    Dim{:v1}(range(1, 3, length=3)),
    Dim{:v2}(["x1", "x2", "x3"])
)
test_arr = YAXArray(axlist, rand(3,3))

# All of these work
test_arr[v1=BitVector([true, false, true])]
test_arr[v1=[1, 3]]

test_arr[v2=BitVector([true, false, true])]
test_arr[v2=[1, 3]]

test_arr[v1=At([1, 2])]
test_arr[v2=At(["x1", "x2"])]

test_arr[v1=1:2]
test_arr[v2=2:3]

savecube(test_arr, "test_cube.nc", driver=:netcdf)

# Errors out as `test_arr` is not a dataset, but maybe it should save it as
# a dataset with a single entry?
# savedataset(test, path="test_dataset.nc", driver=:netcdf)

# Open file as disk-based store
ds = open_dataset("test_cube.nc")
disk_arr = ds.layer

# None of these work
disk_arr[v1=BitVector([true, false, true])]
disk_arr[v1=BitVector([true, false, true])]
disk_arr[v1=[1, 3]]
disk_arr[v1=At([1, 2])]
disk_arr[v2=At(["x1", "x2"])]

# But ranges do work for some reason?
disk_arr[v1=1:2]
disk_arr[v2=2:3]

The error is:

ERROR: ArgumentError: Unable to determine chunksize of non-range views.
Stacktrace:
  [1] eachchunk_view(::DiskArrays.Chunked{…}, vv::SubArray{…})
    @ DiskArrays C:\Users\tiwanaga\.julia\packages\DiskArrays\MpOpv\src\subarray.jl:29
  [2] eachchunk
    @ C:\Users\tiwanaga\.julia\packages\DiskArrays\MpOpv\src\subarray.jl:25 [inlined]
  [3] YAXArray
    @ C:\Users\tiwanaga\.julia\packages\YAXArrays\zyFvF\src\Cubes\Cubes.jl:136 [inlined]
  [4] rebuild(A::YAXArray{…}, data::DiskArrays.SubDiskArray{…}, dims::Tuple{…}, refdims::Tuple{}, name::DimensionalData.NoName, metadata::Dict{…})
    @ YAXArrays.Cubes C:\Users\tiwanaga\.julia\packages\YAXArrays\zyFvF\src\Cubes\Cubes.jl:200
  [5] rebuild
    @ C:\Users\tiwanaga\.julia\packages\DimensionalData\BZbYQ\src\array\array.jl:85 [inlined]
  [6] rebuildsliced
    @ C:\Users\tiwanaga\.julia\packages\DimensionalData\BZbYQ\src\array\array.jl:100 [inlined]
  [7] rebuildsliced
    @ C:\Users\tiwanaga\.julia\packages\DimensionalData\BZbYQ\src\array\array.jl:99 [inlined]
  [8] view
    @ C:\Users\tiwanaga\.julia\packages\DimensionalData\BZbYQ\src\array\indexing.jl:125 [inlined]
  [9] _dim_view
    @ C:\Users\tiwanaga\.julia\packages\DimensionalData\BZbYQ\src\array\indexing.jl:110 [inlined]
 [10] #view#110
    @ C:\Users\tiwanaga\.julia\packages\DimensionalData\BZbYQ\src\array\indexing.jl:81 [inlined]
 [11] getindex(::YAXArray{Float64, 2, YAXArrayBase.NetCDFVariable{…}, Tuple{…}, Dict{…}}; kwargs::@Kwargs{v1::BitVector})
    @ YAXArrays.Cubes C:\Users\tiwanaga\.julia\packages\YAXArrays\zyFvF\src\Cubes\Cubes.jl:487
 [12] top-level scope
    @ c:\Users\tiwanaga\projects\ADRIA.jl\sandbox\yaxarray_issue\main.jl:30
Some type information was truncated. Use `show(err)` to see complete types.
ConnectedSystems commented 3 months ago

I suspect it is the DiskArrays.jl dependency.

On both machines YAXArrays.jl is at v0.5.8 but:

ConnectedSystems commented 3 months ago

Confirming that the example above works if I revert back to DiskArrays.jl v0.3.23

ConnectedSystems commented 3 months ago

@meggart @felixcremer just submitted a potential fix to DiskArrays.jl.

I'm happy to submit a PR adding the above code as a test case to YAXArrays.jl