Open Jan-Willem opened 5 months ago
Here are the the issues that I found during my review. I'll address the explicit review questions in a separate comment.
What is the thought behind making mount a coordinate instead of a data variable?
Mount types are incomplete. Several mount types were introduced after MSv2 was defined:
ALT-AZ+ROTATOR: alt-az mount with feed rotator; introduced for ASKAP dishes.
ALT-AZ+NASMYTH-R: Nasmyth mount with receivers at the right-hand side of the cabin. Many high-frequency antennas used for VLBI have such a mount type.
ALT-AZ+NASMYTH-L: Nasmyth mount with receivers at the left-hand side of the cabin.
There also is a confusion about the X-Y mount type. This mount type is typically used for dishes that are used to track satellites. But it actually matters how the axes are aligned with the earth. I think it is common to align E-W or N-S, but in principle other orientations are possible. The Hobart 26m antenna that is regularly used for VLBI is oriented E-W. And then LOFAR decided for some reason to use X-Y as the mount type for their antennas which are fixed to the earth. So this may need untangling.
The indirection to go from polarization type to receptor_name is inconvenient. This may be unavoidable in order to support multiple receivers? But maybe the split by spectral windows allows this to be eliminated?
The gain_curve_time and phase_cal_time coordinates have type 'time' in the schema, but float64 in the implementation. That makes it impossible to index by time?
The tone_label coordinate isn't present. Just using indices is probably fine, but it doesn't match the schema.
The ANTENNA_POSITION has the attribute "ellipsoid: GRS80", but the antenna coordinates are ITRF Geocentric coordinates. I'm not an expert on geodetic reference frames, but I'm not sure this conveys the correct information about the antenna positions in the VLBI context.
Should ANTENNA_DISH_DIAMETER be optional? What should be done with "dishes" like the Mark II at Jodrell or the GBT that aren't standard parabolic dishes?
What is TELESCOPE_CENTER?
EFFECTIVE_DISH_DIAMETER is optional?
BASELINE_REFERENCE may not be all that useful. Modern (FX) VLBI correlators typically use center-of-earth as their reference so don't need this. The (XF) Mark4 correlator that I'm familiar with was baseline-based but had the annoying property that it used ANTENNA1 as a reference for the parallel hands and one of the cross hands and ANTENNA2 as a reference for the other cross hand. So BASELINE_REFERENCE as defined in MSv2 can't be used.
And the way BASELINE_REFERENCE is defined in MSv4 makes no sense at all. Suppose you have a correlator that calculates the following baselines (omitting the auo-correlations):
0 1 0 2 1 2
Only one antenna on the baseline can be the reference, so let's assume it is the first one. But that would mean antenna 1 has to marked as both reference and not-reference.
Probably best to drop BASELINE_REF altogether.
Having the receptor_name as the last coordinate of the GAIN_CURVE makes it slightly inconvenient to extract the polynomial terms as an array. Swap coordinates? That would make it consistent with PHASE_CAL
Gain curve parametrization type is missing.
PHASE_CAL* still have antenna_id instead of name as their primary coordinate.
PHASE_CAL_TONE_FREQUENCY has the receptor_name and tone_label in the wrong order. As in:
ant_xds.PHASE_CAL_TONE_FREQUENCY[0][0].loc['0']
produces an array with two identical frequencies. Maybe this means that PHASE_CAL is wrong as well? This may be an importfitsidi bug...
The log output from the jupyter lab server produces blinking text:
[2024-08-13 14:00:59,511] WARNING worker_1: Source_id is -1. No source information will be included in the field_and_source_xds.
(The time stamp there is blinking in my terminal; party like it's 1999?)
Looking closer at the PHASE_CAL coordinate order issue, I now understand why GAIN_CURVE ended up with receptor_name as its last axis. PHASE_CAL needs that too, or if we want to swap the coordinates like I'm proposing for GAIN_CURVE the data array needs to be transposed accordingly.
Here are some initial notes. We will discuss things in more detail during the VIPER meeting today.
What is the thought behind making mount a coordinate instead of a data variable? The basic philosophy is that the information that could be plotted are data variables (these are always numbers) and the information that would be used to label these plots are coordinates (these can be numbers or strings). To more closely follow this "rule" I have changed polarization_type to a coordinate. I am sure we will have exceptions to this "rule"/"guideline".
Mount types are incomplete. Several mount types were introduced after MSv2 was defined. I added ALT-AZ variants to the schema, unfortunately the VLBA dataset does not have the variant information.
The indirection to go from polarization type to receptor_name is inconvenient. This may be unavoidable in order to support multiple receivers? But maybe the split by spectral windows allows this to be eliminated? A single spw allows for a single feed with 2 receivers per antenna, but we can’t guarantee that all antenna receivers will be the same. I changed polarization_type coordinate (to be consistent). Another option is to make the "name" dimension optional of polarization_type for the case where the receptor polarizations are the same for all antennas.
The gain_curve_time and phase_cal_time coordinates have type 'time' in the schema, but float64 in the implementation. That makes it impossible to index by time? All times (even in the main dataset) are stored as float64. When we load into memory we change it to numpy.Datetime64. The reason we don't want to store it as numpy.Datetime64 is to make it easier for non Python data readers.
The tone_label coordinate isn't present. Just using indices is probably fine, but it doesn't match the schema. Is there a labeling convention you would prefer? The same applies to receiver_name.
The ANTENNA_POSITION has the attribute "ellipsoid: GRS80", but the antenna coordinates are ITRF Geocentric coordinates. I'm not an expert on geodetic reference frames, but I'm not sure this conveys the correct information about the antenna positions in the VLBI context. The GRS80 is what is commonly used along with ITRF (https://www.icsm.gov.au/education/fundamentals-mapping/datums/datums-explained-more-detail). I agree we should add this information to the measure.
Should ANTENNA_DISH_DIAMETER be optional? What should be done with "dishes" like the Mark II at Jodrell or the GBT that aren't standard parabolic dishes? This is mostly used to calculate the Airy disk models. I have made this optional
What is TELESCOPE_CENTER? This is the position used to calculate the paralactic angle for non-VLBI arrays (assume it is the same for all telescopes)
EFFECTIVE_DISH_DIAMETER is optional? Changed this to be optional
Probably best to drop BASELINE_REF altogether. Dropped
Having the receptor_name as the last coordinate of the GAIN_CURVE makes it slightly inconvenient to extract the polynomial terms as an array. Swap coordinates? That would make it consistent with PHASE_CAL I swapped it
Gain curve parametrization type is missing. Added it as a coordinate
PHASE_CAL* still have antenna_id instead of name as their primary coordinate. Fixed
PHASE_CAL_TONE_FREQUENCY has the receptor_name and tone_label in the wrong order. As in: ant_xds.PHASE_CAL_TONE_FREQUENCY[0][0].loc['0'] produces an array with two identical frequencies. Maybe this means that PHASE_CAL is wrong as well? This may be an importfitsidi bug... Fixed, please check.
The log output from the jupyter lab server produces blinking text: [2024-08-13 14:00:59,511] WARNING worker_1: Source_id is -1. No source information will be included in the field_and_source_xds. (The time stamp there is blinking in my terminal; party like it's 1999?) Blame @jrhosk :)
Just quickly checked the PHASE_CAL and GAIN_CURVE fixes. PHASE_CAL looks good to me now. Having the gain_curve_type depend on the gain_curve_time coordinate seems strange. Technically speaking it is of course possible to use a different parametrization of the gain curve over time but I think that is extremely unlikely to happen within a single MSv4.
I'm sorry I couldn't attend the review meeting as I took a day off. Here is my comments/questions.
Regarding 1.1), how do you envision the way to fit multi-feed receiver into the schema? Here, multi-feed receiver means single-dish antenna with focal-plane array receiver, such as 7BEE on NRO 45m telescope.
1.2) I think information is sufficient, at least for ALMA single-dish usecase. But, I think either ANTENNA_DISH_DIAMETER or ANTENNA_EFFECTIVE_DISH_DIAMETER should be mandatory (not optional) because antenna diameter is required to estimate spatial resolution of the single-dish observation.
Regarding 1.5), does BEAM_OFFSET vary in time if it is defined as sky_dir_label?
"Antenna_ids are no longer used in favor of the antenna name": does this mean we no longer associate antenna_xds with, e.g., main_xds/pointing_xds using antenna_id, and we instead use name (or name_station) for this purpose?
In response to @tnakazato :
"How do you envision the way to fit multi-feed receiver into the schema?" Since we split by SPW a single MSv4 should have no more than one feed associated with an antenna. MSv4 is also limited to a single beam per antenna. We have ASKAP stakeholder datasets to demonstrate this where each beam is its own MSv4.
1.2) "But, I think either ANTENNA_DISH_DIAMETER or ANTENNA_EFFECTIVE_DISH_DIAMETER should be mandatory (not optional) because antenna diameter is required to estimate spatial resolution of the single-dish observation." The issue is that some dishes are not circular. We can change the definition to "The diameter of the main reflector (or the largest dimension for non-circular apertures)." and make it nonoptional (@kettenis would this work?).
"Regarding 1.5), does BEAM_OFFSET vary in time if it is defined as sky_dir_label?" I think it does not vary with time because it is an offset. It is defined as "Beam position offset, as defined on the sky but in the antenna reference frame."
"Antenna_ids are no longer used in favor of the antenna name": does this mean we no longer associate antenna_xds with, e.g., main_xds/pointing_xds using antenna_id, and we instead use name (or name_station) for this purpose? Yes
The following feedback from the review meeting have been incorporated:
The pointing information will be handled in a separate ticket.
Hi @Jan-Willem,
Thank you for your reply.
"How do you envision the way to fit multi-feed receiver into the schema?" Since we split by SPW a single MSv4 should have no more than one feed associated with an antenna. MSv4 is also limited to a single beam per antenna. We have ASKAP stakeholder datasets to demonstrate this where each beam is its own MSv4.
I understand that individual MSv4 only contain single feed and single beam. But even in that case, we should take into account the difference of pointing direction among feeds/beams in some way. In our NRO MSv2 data, POINTING table absorbs such difference. POINTING table contains pointing direction for each feed and they can be distinguished by ANTENNA_ID since we regard each feed as separate antenna in ANTENNA table.
Assuming that MSv4 takes similar approach, pointing direction stored in POINTING xds will take all offset information into account. Then, the direction at a certain time in POINTING xds can be different across MSv4 even if antenna name or antenna_station name is the same. Is my understanding correct?
1.2) "But, I think either ANTENNA_DISH_DIAMETER or ANTENNA_EFFECTIVE_DISH_DIAMETER should be mandatory (not optional) because antenna diameter is required to estimate spatial resolution of the single-dish observation." The issue is that some dishes are not circular. We can change the definition to "The diameter of the main reflector (or the largest dimension for non-circular apertures)." and make it nonoptional (@kettenis would this work?).
Ah, I see. Then how about having xds/attribute/data for antenna beam model instead of making ANTENNA_DISH_DIAMETER mandatory?
"Regarding 1.5), does BEAM_OFFSET vary in time if it is defined as sky_dir_label?" I think it does not vary with time because it is an offset. It is defined as "Beam position offset, as defined on the sky but in the antenna reference frame."
Maybe I'm just ignorant, but I don't understand what "defined on the sky" means. Does it mean the offset is defined on the celestial sphere? Then, the value is in the antenna reference frame?
"Antenna_ids are no longer used in favor of the antenna name": does this mean we no longer associate antenna_xds with, e.g., main_xds/pointing_xds using antenna_id, and we instead use name (or name_station) for this purpose? Yes
Understood.
Assuming that "BEAM_OFFSET" is a strict mechanical offset of the nominal direction of peak sensitivity, owing to the structural details of feed positioning not otherwise compensated for (e.g., by tilting the subreflector), it is indeed a direction offset specified within the antenna reference frame and somehow associated with the details of the antenna mount type. And constant in time for fixed mechanicals in the antenna. Any coordinate or label that refers to "(ra,dec)" is confusing, at best, here. One also has to be a little bit careful with direction offsets when the antenna's native pointing coordinates are Az/El. A fixed mechanical beam offset (due to offset feed position within the optics) will not be a constant Azimuth offset over the full range of elevation. It would be better described as El/Cross-El, where Cross-El is an angle measured perpendicular to El on a great circle (which Az is not). All of this is subject to the direction coordinate systems we adopt to describe these things. Strings like "sky_dir_label" and "sky_coord_offset" are not intuitive to me in a general way....
Thanks @gmoellen for detailed explanation.
Review Instructions
The review ipynb can be found here: https://github.com/casangi/xradio/blob/main/reviews/review_antenna_xds.ipynb
Please review the MSv4
antenna_xds
schema and the XRADIO interface (ps['MSv4_name'].antenna_xds
). Note that the PS (processing set) interface or the main_xds should not be reviewed.The
antenna_xds
schema specification: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/edit#gid=257301047The processing set is a loose collection of MSv4 which might come from multiple MSv2 (or ASDMS). Consequently, arbitrary ids are avoided in favor of descriptive strings.
Two example datasets will be used VLBA_TL016B_split_lsrk.ms (VLBI) and uid___A002_X1015532_X1926f.small.ms (Single Dish).
Preparatory Material
Go over Xarray nomenclature and selection syntax:
MSv2 and CASA documentation:
VLBI MSv2 extension: https://casacore.github.io/casacore-notes/265.pdf
antenna_xds
SchemaThe ANTENNA, FEED, GAIN_CURVE, PHASE_CAL, and INTERFEROMETER_MODEL (VLBI, not yet included) tables in the MSv2 contain closely related information that is all related to the antenna. The MSv4 contains a single SPW, consequently, an antenna will only have a single feed associated with it.
Antennaids are no longer used in favor of the antenna name (for antennas that can move the name consists of the name + "" + station). This allows for easy comparison of MSv4s from different observations.
Key Questions to Answer
Schema Questions
XRADIO
2.1) Please check all the data variables (names,dimensions,measures) and coordinates (names,dimensions,measures) in both the google spreadsheet and ipynb.
Environment instructions
It is recommended to use the conda environment manager to create a clean, self-contained runtime where xradio and all its dependencies can be installed:
Clone the repository, checkout the review branch and do a local install:
On macOS it is required to pre-install python-casacore using
bash conda install -c conda-forge python-casacore
.