casangi / xradio

Xarray Radio Astronomy Data IO
https://xradio.readthedocs.io/en/latest/
Other
9 stars 5 forks source link

Lofar performance improvements #218

Open sstansill opened 1 month ago

sstansill commented 1 month ago

Adds a new method read_col_conversion_dask that allows larger than memory columns to be converted. Various changes:

  1. xarray DataSet encoding has been cleaned up and adjusted to ignore DataArrays that are dask arrays
  2. lofar and lofar_read_size arguments added to convert_msv2_to_processing_set
  3. TableManager class has been added so that multi-thread/process conversion can happen without having to serialize casacore table objects. This replaces open_table_ro and open_query in convert_and_write_partition
  4. read_col_conversion_dask uses dask's map_blocks to create tasks for each chunk of a DataArray which reads data from a MSv2 column and reshapes it

This has been used to convert 9TB of lofar data in ~4.5 hours which was previously impossible unless a compute node with >9TB of memory is used