casangi / xradio

Xarray Radio Astronomy Data IO
https://xradio.readthedocs.io/en/latest/
Other
12 stars 7 forks source link

read_generic_table not working on columns with empty cells. #197

Closed Jan-Willem closed 3 months ago

Jan-Willem commented 3 months ago

An example dataset can be obtained using:

import graphviper
graphviper.utils.data.download(file="ALMA_uid___A002_X1003af4_X75a3.split.avg.ms")

The loaded source_xds does not contain the transition and rest frequency information.

from xradio.vis._vis_utils._ms._tables.read import (
    read_generic_table,
)
source_xds = read_generic_table('ALMA_uid___A002_X1003af4_X75a3.split.avg.ms','SOURCE')
source_xds
tablebrowser xradio_read_generic_table
FedeMPouzols commented 3 months ago

I think that the problem is that the filter that we've traditionally used against unloadable and/or unfilled columns is too strict and weak: https://github.com/casangi/xradio/blob/a3d37f51dff9d95acc35c16ff35c0cc5804ca224/src/xradio/vis/_vis_utils/_ms/_tables/read.py#L886 If the first cell is empty then it is assumed that the column will be empty, or not worth/safe to load.

Also we do not really need the value of that or any particular cell, just the column type.

The branch of this issue has a fix that should remove this limitation. With this fix, as long as a column is defined (and is not of unsupported/troublesome type such as 'record') it will be loaded. The values for "empty cells" (in the casacore sense) will be empty. This way we are saying "casacore empty cell" = "cell has empty array", while previously (a first) empty cell was interpreted as "not a column to load safely".

FedeMPouzols commented 3 months ago

The fix will load columns like the example ones, regardless of whether the cells are empty (in the sense of "iscelldefined() == False"). That should prevent missing columns and let other work continue. But there might be additional nuances to discuss. In the ALMA example given in the description of this issue, the variables SYSVEL, DIRECTION and POSITION were also missing. From these, POSITION is an example of the extreme case where the column is defined, but all the cells are left empty (iscelldefined(...) == False). Such cases will produce data variables that do not have any effective values. Screenshot_2024-07-18_11-37-12