JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
103 stars 18 forks source link

update dataset values #229

Closed lazarusA closed 1 year ago

lazarusA commented 1 year ago

@meggart Following on our discussion, the following for some reason is not working at the writing level:

using YAXArrays, Zarr
using Dates

# ## Create `Dataset` with NaN values
axlist = [
    RangeAxis("time", Date("2022-01-01"):Day(1):Date("2022-01-07")),
    RangeAxis("lon", range(1, 10, length=4)),
    RangeAxis("lat", range(1, 5, length=2)),
    ]
data = rand(7, 4, 2)
data[6:7,:,:] .= NaN

yax_ar = YAXArray(axlist, data)
ds = Dataset(a=yax_ar, c=YAXArray(rand(2,2)))

# Now, save it.
f = tempname();
savedataset(ds, path=f, driver=:zarr)

# ## Update NaN entries with new data

# For now, we use DimensionalData to do the subsetting.
using DimensionalData, YAXArrayBase
open_ds = open_dataset(zopen(f, "w"))

# Select variable
d = open_ds["a"]

# Subsetting
dim_sub = yaxconvert(DimArray, d)
subset = dim_sub[time = Between( Date("2022-01-05"),  Date("2022-01-07"))]
csub = yaxconvert(YAXArray, subset)

# Get new data
new_data = fill(1.0, size(csub.data))
# update interval with the new data
csub.data .= new_data

# Open again the file (output is still the same)
open_ds_up = open_dataset(f)
d = open_ds_up["a"]
d.data[:,:,:]
meggart commented 1 year ago

Hi I just saw this issue. Since you do the subsetting using DimensionalData my guess would be that your subset is not a view into the data on disk but just reads the data as an in-memory array, which of course will not modify the original data when written to. What is wrong with just doing subset = d[time = Date("2022-01-05")..Date("2022-01-07")] and saving the conversion?

lazarusA commented 1 year ago

subset = d[time = Date("2022-01-05")..Date("2022-01-07")] doing the subsetting didn't work for me last time I tried, hence the use of DD. Could you please try your approach then? Make sure that is not done in one of your local working branches 😄 .

TabeaW commented 1 year ago

Well I am sorry, this approach works for me, too. I tried it with [1,:,:] and haven't read the docu well enough.

lazarusA commented 1 year ago

happy to read.

meggart commented 1 year ago

Ah @lazarusA I see when I try

csub = d[time = Date("2022-01-05")..Date("2022-01-07")]

i get

YAXArray with the following dimensions
time                Axis with 2 Elements from 2022-01-05T00:00:00 to 2022-01-06T00:00:00
lon                 Axis with 4 Elements from 1.0 to 10.0
lat                 Axis with 2 Elements from 1.0 to 5.0
name: a
Total size: 128.0 bytes

which is still remnants of old legacy behavior where for some reason we wanted to match some python behavior and exclude the upper bound of the range. As @TabeaW mentioned for now you can just directly index into the .data field of the YAXArray and write data there. We should really work on the DimArray branch that @felixcremer started to make a YAXArray an AbstractDimArray so we don't have to do these workarounds anymore.