We convert this MSv2 data to processing set, with the default partition scheme, and specifying main_chunksize={"frequency": 1}).
We read the converted data using read_processing_set and store it in ps variable.
( We also perform ps = ps.get(0) to read the partition)
In ps, the output of VISIBILITY dataarray is as follows:
In the above output, even though the chunks on VISIBILITY data are as expected, we can also see the the co-ordinates baseline_antenna1_id and baseline_antenna2_id are also chunked on dimension baseline_id which was not specified during the conversion.
Because of this inconsistency, trying to read xarray's chunksizes attribute on any of the Dataarrays inside ps fails
In [85]: ps.VISIBILITY.chunksizes
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[85], line 1
----> 1 ps.VISIBILITY.chunksizes
File /opt/miniconda3/envs/xradio/lib/python3.11/site-packages/xarray/core/dataarray.py:1335, in DataArray.chunksizes(self)
1320 """
1321 Mapping from dimension names to block lengths for this dataarray's data, or None if
1322 the underlying data is not a dask array.
(...)
1332 xarray.unify_chunks
1333 """
1334 all_variables = [self.variable] + [c.variable for c in self.coords.values()]
-> 1335 return get_chunksizes(all_variables)
File /opt/miniconda3/envs/xradio/lib/python3.11/site-packages/xarray/core/common.py:2055, in get_chunksizes(variables)
2053 for dim, c in v.chunksizes.items():
2054 if dim in chunks and c != chunks[dim]:
-> 2055 raise ValueError(
2056 f"Object has inconsistent chunks along dimension {dim}. "
2057 "This can be fixed by calling unify_chunks()."
2058 )
2059 chunks[dim] = c
2060 return Frozen(chunks)
ValueError: Object has inconsistent chunks along dimension baseline_id. This can be fixed by calling unify_chunks().
And If unify_chunks() is called on the VISIBILITY data, the final chunks are not as expected (see baseline_id dimension below)
We are facing above issue since we started experimenting on xradio (since v0.0.28) and it still persists in v0.0.31.
We can't use v0.0.33 and further because of the conversion issue that I have raised in #214
We have a simulated MSv2 data which we use for testing purposes on our workstations. The data has following dimensions:
Time: 120 Baseline: 1,30,816 Channels: 150 Polarizations: 1 (XX)
The overall size of the data is around 16 GB.
We convert this MSv2 data to processing set, with the default partition scheme, and specifying
main_chunksize={"frequency": 1})
. We read the converted data using read_processing_set and store it inps
variable. ( We also performps = ps.get(0)
to read the partition)In
ps
, the output of VISIBILITY dataarray is as follows:In the above output, even though the chunks on
VISIBILITY
data are as expected, we can also see the the co-ordinatesbaseline_antenna1_id
andbaseline_antenna2_id
are also chunked on dimensionbaseline_id
which was not specified during the conversion.Because of this inconsistency, trying to read xarray's
chunksizes
attribute on any of the Dataarrays insideps
failsAnd If
unify_chunks()
is called on theVISIBILITY
data, the final chunks are not as expected (seebaseline_id
dimension below)