casangi / xradio

Xarray Radio Astronomy Data IO
https://xradio.readthedocs.io/en/latest/
Other
15 stars 7 forks source link

schema_checker issues and conversion errors in datasets from casatestdata/CASA guides #271

Closed FedeMPouzols closed 1 month ago

FedeMPouzols commented 1 month ago

I'm creating this issue to have in a branch some fixes for various schema validation issues caught by the schema checker + errors in the converter code, while looking into other issues/errrors.

In casatestdata, out of ~305 datasets we get:

From the schema checker issues, a majority seem minor issues in some units and frames. From the conversion errors, a very common one is issues with dimensions in extract_feed_info

The conversion errors are spread as follows:

(note that in these stats there are some 5-10 datasets that I think we do not expect to convert without errors, like for example the crazySourceTable.ms, but I wanted to catch as many validation/conversion issues).

FedeMPouzols commented 1 month ago

With the fixes in this branch the number of MSs with schema checker issues is down from 119 to 6: 3 in ALMA and 3 in Other (BIMA) datasets. All the issues seem related to reference frames.

The number of conversion errors is down to 5:

From the 3 VLA errors, there is a failure that is not in conversion but in the summary function (invalid coordinate frame when trying to construct an SkyCoord) - this is with the exotic "b1950_vla" which I think is an "FK4" with a custom epoch ("1979.9").

This set of 305 excludes some MSs that have been troublesome in the past (for example a Dysco LOFAR dataset), which should be re-checked later on.

The stakeholder test failure I think is unrelated to this branch. We have a test failure in main, in test_preconverted_alma due to an error with inconsistent xds names: MAIN / correlated_xds that must be related to the ongoing renaming to 'correlated_xds'.

FedeMPouzols commented 1 month ago

The changes in this branch now fix all the schema issues listed above. But recent changes from last week have introduced a relatively common conversion error, apparently when there are different polarization setups / IDs used in the DDI table and those have different NUM_CORR, as used for example in channel_average SPWs. This makes 11 MSs of 83 ALMA test MSs fail, and I suspect most if not all freshly imported ALMA MSs will have similar issues. I'm looking into it.

Before this, the errors remaining were either expected:

or related to coordinate frames which should probably dealt with in its own issue:

In CASA guides, one EVN dataset, n14c3.ms, has conversion errors because of multiple gain curves/times per antennas.