It seems that the update() function with depth > 0 (with overlaps) has a different behavior when the computation is parallelized over one worker, or more than one worker.
To be more precise, some partitions that should have overlap values on the left and write of the partitioning axis are missing one of the side. I will try to illustrate this using a simple test case
Test case
I am using the simple use case presented in the gallery. It creates a zcollection in memory with a monthly partitioning
I am then defining a callback that simply prints the details about the partition that we are currently updating (partition_info arguments), and the extent of the dataset that is available for the update.
When using only one worker in the local cluster, all the partitions have the proper dataset extent available. Some overlap are missing for the partition covering the most ancient and recent periods respectively, but this is expected. Also note that we have a the first partition twice because it is used by zcollection to infer which fields are updated by our callback.
However, when scaling the cluster to 2 workers, some central partitions are missing overlap data : ex. the march period should use february to april, but only uses february to march.
It feels like the entire update has been split between the workers and that they do not share their partitions. I did not test with more workers but I expect more errors the more workers there are.
Hi !
It seems that the update() function with depth > 0 (with overlaps) has a different behavior when the computation is parallelized over one worker, or more than one worker.
To be more precise, some partitions that should have overlap values on the left and write of the partitioning axis are missing one of the side. I will try to illustrate this using a simple test case
Test case
I am using the simple use case presented in the gallery. It creates a zcollection in memory with a monthly partitioning
I am then defining a callback that simply prints the details about the partition that we are currently updating (partition_info arguments), and the extent of the dataset that is available for the update.
When using only one worker in the local cluster, all the partitions have the proper dataset extent available. Some overlap are missing for the partition covering the most ancient and recent periods respectively, but this is expected. Also note that we have a the first partition twice because it is used by zcollection to infer which fields are updated by our callback.
However, when scaling the cluster to 2 workers, some central partitions are missing overlap data : ex. the march period should use february to april, but only uses february to march.
It feels like the entire update has been split between the workers and that they do not share their partitions. I did not test with more workers but I expect more errors the more workers there are.
zcollection_version: 2023.3.2