casangi / xradio

Xarray Radio Astronomy Data IO

https://xradio.readthedocs.io/en/latest/

Other

16 stars 7 forks source link

Review ms_xds.attrs.antenna_xds schema and XRADIO interface. #168

Open Jan-Willem opened 5 months ago

Jan-Willem commented 5 months ago

Review Instructions

The review ipynb can be found here: https://github.com/casangi/xradio/blob/main/reviews/review_antenna_xds.ipynb

Please review the MSv4 antenna_xds schema and the XRADIO interface (ps['MSv4_name'].antenna_xds). Note that the PS (processing set) interface or the main_xds should not be reviewed.

The antenna_xds schema specification: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/edit#gid=257301047

The processing set is a loose collection of MSv4 which might come from multiple MSv2 (or ASDMS). Consequently, arbitrary ids are avoided in favor of descriptive strings.

Two example datasets will be used VLBA_TL016B_split_lsrk.ms (VLBI) and uid___A002_X1015532_X1926f.small.ms (Single Dish).

Preparatory Material

Go over Xarray nomenclature and selection syntax:

MSv2 and CASA documentation:

MSv2 schema: https://casacore.github.io/casacore-notes/229.pdf
MSv3 schema: https://casacore.github.io/casacore-notes/264.pdf

VLBI MSv2 extension: https://casacore.github.io/casacore-notes/265.pdf

`antenna_xds` Schema

The ANTENNA, FEED, GAIN_CURVE, PHASE_CAL, and INTERFEROMETER_MODEL (VLBI, not yet included) tables in the MSv2 contain closely related information that is all related to the antenna. The MSv4 contains a single SPW, consequently, an antenna will only have a single feed associated with it.

Antennaids are no longer used in favor of the antenna name (for antennas that can move the name consists of the name + "" + station). This allows for easy comparison of MSv4s from different observations.

Key Questions to Answer

Schema Questions

1.1) Are there missing use cases?
1.2) Is all the information present needed for offline processing?
1.3) (VLBI) Instead of storing BASELINE_REFERENCE in main_xds can we store it in the antenna_xds? This would assume that the reference antennas remain constant for the duration of the MS v4.
1.4) Is the order of the dims correct?
1.5) Should BEAM_OFFSET be sky_dir_label (Ra, Dec) or local_sky_label (Az, Alt)?
1.6) Should we add prefex to organize data variables? For example PHASE_DELAY -> VLBI_PHASE_DELAY?
1.7) Should we include the POLARIZATION_RESPONSE doesn't seem to be used?
1.8) ANTENNA_FEED_OFFSET is a new data variable that is calculated using (Antenna_Table.offset + Feed_Table.position). Is this any need to store these values separately?

XRADIO

2.1) Please check all the data variables (names,dimensions,measures) and coordinates (names,dimensions,measures) in both the google spreadsheet and ipynb.

Environment instructions

It is recommended to use the conda environment manager to create a clean, self-contained runtime where xradio and all its dependencies can be installed:

conda create --name xradio python=3.11 --no-default-packages
conda activate xradio

Clone the repository, checkout the review branch and do a local install:

git clone https://github.com/casangi/xradio.git
git checkout 168-review-ms_xdsattrsantenna_xds-schema-and-xradio-interface
cd xradio
pip install -e .

On macOS it is required to pre-install python-casacore using bash conda install -c conda-forge python-casacore.

kettenis commented 3 months ago

Here are the the issues that I found during my review. I'll address the explicit review questions in a separate comment.

What is the thought behind making mount a coordinate instead of a data variable?
Mount types are incomplete. Several mount types were introduced after MSv2 was defined:

ALT-AZ+ROTATOR: alt-az mount with feed rotator; introduced for ASKAP dishes.

ALT-AZ+NASMYTH-R: Nasmyth mount with receivers at the right-hand side of the cabin. Many high-frequency antennas used for VLBI have such a mount type.

ALT-AZ+NASMYTH-L: Nasmyth mount with receivers at the left-hand side of the cabin.

There also is a confusion about the X-Y mount type. This mount type is typically used for dishes that are used to track satellites. But it actually matters how the axes are aligned with the earth. I think it is common to align E-W or N-S, but in principle other orientations are possible. The Hobart 26m antenna that is regularly used for VLBI is oriented E-W. And then LOFAR decided for some reason to use X-Y as the mount type for their antennas which are fixed to the earth. So this may need untangling.
The indirection to go from polarization type to receptor_name is inconvenient. This may be unavoidable in order to support multiple receivers? But maybe the split by spectral windows allows this to be eliminated?
The gain_curve_time and phase_cal_time coordinates have type 'time' in the schema, but float64 in the implementation. That makes it impossible to index by time?
The tone_label coordinate isn't present. Just using indices is probably fine, but it doesn't match the schema.
The ANTENNA_POSITION has the attribute "ellipsoid: GRS80", but the antenna coordinates are ITRF Geocentric coordinates. I'm not an expert on geodetic reference frames, but I'm not sure this conveys the correct information about the antenna positions in the VLBI context.
Should ANTENNA_DISH_DIAMETER be optional? What should be done with "dishes" like the Mark II at Jodrell or the GBT that aren't standard parabolic dishes?
What is TELESCOPE_CENTER?
EFFECTIVE_DISH_DIAMETER is optional?
BASELINE_REFERENCE may not be all that useful. Modern (FX) VLBI correlators typically use center-of-earth as their reference so don't need this. The (XF) Mark4 correlator that I'm familiar with was baseline-based but had the annoying property that it used ANTENNA1 as a reference for the parallel hands and one of the cross hands and ANTENNA2 as a reference for the other cross hand. So BASELINE_REFERENCE as defined in MSv2 can't be used.

And the way BASELINE_REFERENCE is defined in MSv4 makes no sense at all. Suppose you have a correlator that calculates the following baselines (omitting the auo-correlations):

0 1 0 2 1 2

Only one antenna on the baseline can be the reference, so let's assume it is the first one. But that would mean antenna 1 has to marked as both reference and not-reference.

Probably best to drop BASELINE_REF altogether.
Having the receptor_name as the last coordinate of the GAIN_CURVE makes it slightly inconvenient to extract the polynomial terms as an array. Swap coordinates? That would make it consistent with PHASE_CAL
Gain curve parametrization type is missing.
PHASE_CAL* still have antenna_id instead of name as their primary coordinate.
PHASE_CAL_TONE_FREQUENCY has the receptor_name and tone_label in the wrong order. As in:

ant_xds.PHASE_CAL_TONE_FREQUENCY[0][0].loc['0']

produces an array with two identical frequencies. Maybe this means that PHASE_CAL is wrong as well? This may be an importfitsidi bug...
The log output from the jupyter lab server produces blinking text:

[2024-08-13 14:00:59,511] WARNING worker_1: Source_id is -1. No source information will be included in the field_and_source_xds.

(The time stamp there is blinking in my terminal; party like it's 1999?)

kettenis commented 3 months ago

1.1) I'm not sure it is possible to describe Phased Array Feeds (PAFs) in sufficient detail for calibration need. Some telescopes that are used for VLBI have one or are getting one, and there is desire to use these to facilitate wide-field VLBI observations. Another (VLBI) use case is antennas that provide multiple beams. The VLBI output of SKA is going to provide this, but in principle any connected-element array could be phased up and provide multiple beams.
1.2) Gain curve parametrization is missing. Also EOPs need to be available somehow. If in the future the correlator model is distributed as part of the antenna_xds, it might make sense to provide EOPs here as they are tied to the correlator model.
1.3) See comment above. I think the proposed solution doesn't work.
1.4) See comment above for GAIN_CURVE. Isn't antenna_id supposed to be replaced with name?
1.5) No idea.
1.6) In principle gain curves and phase cal are not VLBI-specific at all, although phase cal is probably not terribly useful for non-VLBI observations (but they might be useful for ngVLA calibration). It is just that most existing connected-element arrays distribute their gain curves in a different way.
1.7/1.8) Same question? Are these what we call D-terms in calibration? Those would be useful to have, but I guess that for many antennas/receivers these are not stable enough to provide.
1.9) Something like that would be needed to recalculate the correlator model for VLBI.
2.1) See comment above. I think I spotted some mistakes.

kettenis commented 3 months ago

Looking closer at the PHASE_CAL coordinate order issue, I now understand why GAIN_CURVE ended up with receptor_name as its last axis. PHASE_CAL needs that too, or if we want to swap the coordinates like I'm proposing for GAIN_CURVE the data array needs to be transposed accordingly.

Jan-Willem commented 3 months ago

Here are some initial notes. We will discuss things in more detail during the VIPER meeting today.

What is the thought behind making mount a coordinate instead of a data variable? The basic philosophy is that the information that could be plotted are data variables (these are always numbers) and the information that would be used to label these plots are coordinates (these can be numbers or strings). To more closely follow this "rule" I have changed polarization_type to a coordinate. I am sure we will have exceptions to this "rule"/"guideline".
Mount types are incomplete. Several mount types were introduced after MSv2 was defined. I added ALT-AZ variants to the schema, unfortunately the VLBA dataset does not have the variant information.
The indirection to go from polarization type to receptor_name is inconvenient. This may be unavoidable in order to support multiple receivers? But maybe the split by spectral windows allows this to be eliminated? A single spw allows for a single feed with 2 receivers per antenna, but we can’t guarantee that all antenna receivers will be the same. I changed polarization_type coordinate (to be consistent). Another option is to make the "name" dimension optional of polarization_type for the case where the receptor polarizations are the same for all antennas.
The gain_curve_time and phase_cal_time coordinates have type 'time' in the schema, but float64 in the implementation. That makes it impossible to index by time? All times (even in the main dataset) are stored as float64. When we load into memory we change it to numpy.Datetime64. The reason we don't want to store it as numpy.Datetime64 is to make it easier for non Python data readers.
The tone_label coordinate isn't present. Just using indices is probably fine, but it doesn't match the schema. Is there a labeling convention you would prefer? The same applies to receiver_name.
The ANTENNA_POSITION has the attribute "ellipsoid: GRS80", but the antenna coordinates are ITRF Geocentric coordinates. I'm not an expert on geodetic reference frames, but I'm not sure this conveys the correct information about the antenna positions in the VLBI context. The GRS80 is what is commonly used along with ITRF (https://www.icsm.gov.au/education/fundamentals-mapping/datums/datums-explained-more-detail). I agree we should add this information to the measure.
Should ANTENNA_DISH_DIAMETER be optional? What should be done with "dishes" like the Mark II at Jodrell or the GBT that aren't standard parabolic dishes? This is mostly used to calculate the Airy disk models. I have made this optional
What is TELESCOPE_CENTER? This is the position used to calculate the paralactic angle for non-VLBI arrays (assume it is the same for all telescopes)
EFFECTIVE_DISH_DIAMETER is optional? Changed this to be optional
Probably best to drop BASELINE_REF altogether. Dropped
Having the receptor_name as the last coordinate of the GAIN_CURVE makes it slightly inconvenient to extract the polynomial terms as an array. Swap coordinates? That would make it consistent with PHASE_CAL I swapped it
Gain curve parametrization type is missing. Added it as a coordinate
PHASE_CAL* still have antenna_id instead of name as their primary coordinate. Fixed
PHASE_CAL_TONE_FREQUENCY has the receptor_name and tone_label in the wrong order. As in: ant_xds.PHASE_CAL_TONE_FREQUENCY[0][0].loc['0'] produces an array with two identical frequencies. Maybe this means that PHASE_CAL is wrong as well? This may be an importfitsidi bug... Fixed, please check.
The log output from the jupyter lab server produces blinking text: [2024-08-13 14:00:59,511] WARNING worker_1: Source_id is -1. No source information will be included in the field_and_source_xds. (The time stamp there is blinking in my terminal; party like it's 1999?) Blame @jrhosk :)

kettenis commented 3 months ago

Just quickly checked the PHASE_CAL and GAIN_CURVE fixes. PHASE_CAL looks good to me now. Having the gain_curve_type depend on the gain_curve_time coordinate seems strange. Technically speaking it is of course possible to use a different parametrization of the gain curve over time but I think that is extremely unlikely to happen within a single MSv4.

tnakazato commented 3 months ago

I'm sorry I couldn't attend the review meeting as I took a day off. Here is my comments/questions.

Regarding 1.1), how do you envision the way to fit multi-feed receiver into the schema? Here, multi-feed receiver means single-dish antenna with focal-plane array receiver, such as 7BEE on NRO 45m telescope.
- FYI, we have imported multi-feed receiver data of NRO 45m telescope into MSv2 as if each receiver is independent antenna. To do that, we created ANTENNA table rows for each receiver with name including receiver id (NRO-BEAM0, NRO-BEAM1, ...) so each receiver has its own antenna id. Since pointing direction for each receiver is slightly shifted from the nominal pointing direction of the antenna, shifted values (on sky) are recorded to the POINTING table independently with antenna id, which can be translated to receiver id. Same treatment would be possible. But it would also be possible to use ANTENNA_FEED_OFFSET to define deviation of each receiver from nominal pointing direction. In the latter case, with respect to 1.9), ANTENNA_FEED_OFFSET is necessary.
1.2) I think information is sufficient, at least for ALMA single-dish usecase. But, I think either ANTENNA_DISH_DIAMETER or ANTENNA_EFFECTIVE_DISH_DIAMETER should be mandatory (not optional) because antenna diameter is required to estimate spatial resolution of the single-dish observation.
Regarding 1.5), does BEAM_OFFSET vary in time if it is defined as sky_dir_label?
"Antenna_ids are no longer used in favor of the antenna name": does this mean we no longer associate antenna_xds with, e.g., main_xds/pointing_xds using antenna_id, and we instead use name (or name_station) for this purpose?

Jan-Willem commented 3 months ago

In response to @tnakazato :

"How do you envision the way to fit multi-feed receiver into the schema?" Since we split by SPW a single MSv4 should have no more than one feed associated with an antenna. MSv4 is also limited to a single beam per antenna. We have ASKAP stakeholder datasets to demonstrate this where each beam is its own MSv4.
1.2) "But, I think either ANTENNA_DISH_DIAMETER or ANTENNA_EFFECTIVE_DISH_DIAMETER should be mandatory (not optional) because antenna diameter is required to estimate spatial resolution of the single-dish observation." The issue is that some dishes are not circular. We can change the definition to "The diameter of the main reflector (or the largest dimension for non-circular apertures)." and make it nonoptional (@kettenis would this work?).
"Regarding 1.5), does BEAM_OFFSET vary in time if it is defined as sky_dir_label?" I think it does not vary with time because it is an offset. It is defined as "Beam position offset, as defined on the sky but in the antenna reference frame."
"Antenna_ids are no longer used in favor of the antenna name": does this mean we no longer associate antenna_xds with, e.g., main_xds/pointing_xds using antenna_id, and we instead use name (or name_station) for this purpose? Yes

Jan-Willem commented 3 months ago

The following feedback from the review meeting have been incorporated:

name changed to antenna_name
receptor_name has been changed to receptor_label with values ['pol_0', 'pol_1']
tone_label now has values ['freq_0','freq_0']
The time dependence has been removed for gain_curve (@kettenis).
The phase_cal data variables can now be interpolated to the main_xds time by setting phase_cal_interpolate=True (time_phase_cal then is changed to time).

The pointing information will be handled in a separate ticket.

tnakazato commented 3 months ago

Hi @Jan-Willem,

Thank you for your reply.

"How do you envision the way to fit multi-feed receiver into the schema?" Since we split by SPW a single MSv4 should have no more than one feed associated with an antenna. MSv4 is also limited to a single beam per antenna. We have ASKAP stakeholder datasets to demonstrate this where each beam is its own MSv4.

I understand that individual MSv4 only contain single feed and single beam. But even in that case, we should take into account the difference of pointing direction among feeds/beams in some way. In our NRO MSv2 data, POINTING table absorbs such difference. POINTING table contains pointing direction for each feed and they can be distinguished by ANTENNA_ID since we regard each feed as separate antenna in ANTENNA table.

Assuming that MSv4 takes similar approach, pointing direction stored in POINTING xds will take all offset information into account. Then, the direction at a certain time in POINTING xds can be different across MSv4 even if antenna name or antenna_station name is the same. Is my understanding correct?

1.2) "But, I think either ANTENNA_DISH_DIAMETER or ANTENNA_EFFECTIVE_DISH_DIAMETER should be mandatory (not optional) because antenna diameter is required to estimate spatial resolution of the single-dish observation." The issue is that some dishes are not circular. We can change the definition to "The diameter of the main reflector (or the largest dimension for non-circular apertures)." and make it nonoptional (@kettenis would this work?).

Ah, I see. Then how about having xds/attribute/data for antenna beam model instead of making ANTENNA_DISH_DIAMETER mandatory?

"Regarding 1.5), does BEAM_OFFSET vary in time if it is defined as sky_dir_label?" I think it does not vary with time because it is an offset. It is defined as "Beam position offset, as defined on the sky but in the antenna reference frame."

Maybe I'm just ignorant, but I don't understand what "defined on the sky" means. Does it mean the offset is defined on the celestial sphere? Then, the value is in the antenna reference frame?

"Antenna_ids are no longer used in favor of the antenna name": does this mean we no longer associate antenna_xds with, e.g., main_xds/pointing_xds using antenna_id, and we instead use name (or name_station) for this purpose? Yes

Understood.

gmoellen commented 2 months ago

Assuming that "BEAM_OFFSET" is a strict mechanical offset of the nominal direction of peak sensitivity, owing to the structural details of feed positioning not otherwise compensated for (e.g., by tilting the subreflector), it is indeed a direction offset specified within the antenna reference frame and somehow associated with the details of the antenna mount type. And constant in time for fixed mechanicals in the antenna. Any coordinate or label that refers to "(ra,dec)" is confusing, at best, here. One also has to be a little bit careful with direction offsets when the antenna's native pointing coordinates are Az/El. A fixed mechanical beam offset (due to offset feed position within the optics) will not be a constant Azimuth offset over the full range of elevation. It would be better described as El/Cross-El, where Cross-El is an angle measured perpendicular to El on a great circle (which Az is not). All of this is subject to the direction coordinate systems we adopt to describe these things. Strings like "sky_dir_label" and "sky_coord_offset" are not intuitive to me in a general way....

tnakazato commented 2 months ago

Thanks @gmoellen for detailed explanation.

casangi / xradio

Review ms_xds.attrs.antenna_xds schema and XRADIO interface. #168

Review Instructions

Preparatory Material

antenna_xds Schema

Key Questions to Answer

Schema Questions

XRADIO

Environment instructions

`antenna_xds` Schema