casangi / xradio

Xarray Radio Astronomy Data IO
https://xradio.readthedocs.io/en/latest/
Other
9 stars 5 forks source link

Schema Checker field_and_source_xds #223

Closed Jan-Willem closed 3 weeks ago

Jan-Willem commented 3 weeks ago

Schema: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/edit?usp=sharing

While doing this, figure out ways to check constraints like:

FedeMPouzols commented 3 weeks ago

This branch now passes the stakeholder tests with the checker re-enabled, with caveats (a couple of downloaded datasets that would need an update/discussion)...

@Jan-Willem : I have disabled the test_alma_pre-converted (Antennae_North.cal.lsrk.split.vis.zarr) temporarily in this branch, as it is outdated. It has schema validation issues, starting from:

 * Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['type']: Required attribute type is missing! (expected: <class 'str'> found: None)
 * Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['units']: Required attribute units is missing! (expected: list[str] found: None)
 * Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['scale']: Required attribute scale is missing! (expected: <class 'str'> found: None)
 * Schema issue with data_vars['VISIBILITY'].attrs['field_and_source_xds'].coords['time'].attrs['format']: Required attribute format is missing! (expected: <class 'str'> found: None)

This is one of several issues with missing metadata that are now fixed in this branch. This Antennae_North.cal.lsrk.split.vis.zarr would need to be re-converted using this branch before we can re-enable test_preconverted_alma.

In addition to this, the s3 dataset would also need to be updated to this (xds.attrs["type"] = "visibility") and more changes that have been merged in since the dataset was generated and uploaded: https://github.com/casangi/xradio/pull/200/files#diff-fcb3d24497bd2d070aa1622753c4663726de6147b87de9fa77e76ab90c0fdb68R756 I've disabled that check in the schema checker in this branch for now.

FedeMPouzols commented 3 weeks ago

This branch now has merged the latest main and the last few commits from #200. I'm still doing some checks, as the merge was complicated but so far so good.

The schema docs produced has an updated version of field_and_source_xds, and now also some placeholders for xds's that were missing.

The stk test that were disabled temporarily are now re-enabled. On a second thought, rather than disabling some tests that depend on pre-generated data, or re-generating and re-uploading them, I've introduced an option to base_test, do_check_schema. That option is enabled by default but for now disabled in test_s3 and test_preconverted_alma. This should be enough to avoid having to re-upload datasets every time a PR updates the schema and/or the converter introducing some mismatch wrt pre-generated datasets.

Jan-Willem commented 3 weeks ago

@FedeMPouzols, disabling the schema checker for test_s3 and test_preconverted_alma is fine. After you have done your checks please open a pull request. Thanks.