CNES / zcollection

Python library allowing to manipulate data split into a collection of groups stored in Zarr format.
https://zcollection.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
12 stars 3 forks source link

selected_variables must include output fields of update() with depth > 0 #7

Closed robin-cls closed 1 year ago

robin-cls commented 1 year ago

Hi,

There is a discrepancy with the selected_variable parameter used in the update() function:

I think it would be better to have the same behavior in these two cases. Best case is not having the output fields in selected_variables as it can reduce the number of reading operations of the zcollection.

Anyway, I have set up a simple case to illustrate the problem :

### IMPORTS
from __future__ import annotations

import datetime
import pprint

import dask.distributed
import fsspec
import numpy

import zcollection
import zcollection.tests.data

print(zcollection.__version__)
>> 2023.3.2

### ZCOLLECTION CREATION
def create_dataset():
    """Create a dataset to record."""
    generator = zcollection.tests.data.create_test_dataset_with_fillvalue()
    return next(generator)

ds = create_dataset()
ds.to_xarray()

fs = fsspec.filesystem('memory')

cluster = dask.distributed.LocalCluster(processes=False)
client = dask.distributed.Client(cluster)

partition_handler = zcollection.partitioning.Date(('time', ), resolution='M')

collection = zcollection.create_collection('time',
                                           ds,
                                           partition_handler,
                                           '/my_collection',
                                           filesystem=fs)

collection.insert(ds)

### UPDATE
def callback(zds):
    new = 2*zds["var2"].values
    return dict(var1=new)

print(collection.update(callback, selected_variables=["var2"]))
>> None

collection.update(callback, selected_variables=["var2"], depth=1)
>> 2023-05-03 09:57:11,122 - distributed.worker - WARNING - Compute Failed
>> Key:       callback-6791e733b811dfee47c879a54d34aad9
>> Function:  wrap_function
>> args:      (('/my_collection/year=2000/month=01', '/my_collection/year=2000/month=02', '/my_collection/year=2000/month=03', '/my_collection/year=2000/month=04', '/my_collection/year=2000/month=05', '/my_collection/year=2000/month=06'))
>> kwargs:    {}
>> Exception: "KeyError('var1')"
fbriol commented 1 year ago

Fixed in development branch.