casangi / xradio

Xarray Radio Astronomy Data IO
Other
9 stars 5 forks source link

Review ms_xds.VISIBILITY.attrs.field_and_source_xds schema and XRADIO interface. #167

Closed Jan-Willem closed 6 days ago

Jan-Willem commented 1 month ago

Review Instructions

These instructions are repeated in the review_field_and_source_xds.ipynb that can be found in xradio/reviews. The notebook includes a demo of an ALMA mosaic ephemeris observation of the sun that should be used for the review.

Please review the MSv4 field_and_source_xds schema and the XRADIO interface (ps['MSv4_name'].VISIBILITY.field_and_source_xds). The PS (processing set) interface or the main_xds should not be reviewed.

The field_and_source_xds schema specification: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/edit#gid=1658760192

Preparatory Material

Go over Xarray nomenclature and selection syntax:

MSv2 and CASA documentation:

field_and_source_xds Schema

The FIELD, SOURCE, and EPHEMERIS tables in the MSv2 contain closely related information:

These can be combined into a single dataset for MSv4 because it consists of a single field and consequently a single source[^1].

Use Cases

The use cases considered during the design of the schema were:

To satisfy these use cases, two types of field_and_source_xds were created: standard and ephemeris. The main difference is that the ephemeris type has a FIELD_PHASE_OFFSET data variable that is relative to the SOURCE_POSITION/SOURCE_DIRECTION data variable (contains the ephemerides and has a time axis), while the standard type has FIELD_PHASE/DELAY/REFERENCE_CENTERS and SOURCE_POSITION (has no time axis). The SOURCE_POSITION/DIRECTION is kept separate from the FIELD_PHASE_OFFSET/CENTER so that the intent OBSERVE_TARGET#OFF_SOURCE is supported and the ephemeris can be easily changed.

Key Questions to Answer

Schema Questions

XRADIO

2.1) After reviewing the XARRAY documentation and the descriptions of the data variables in the field_and_source_xds schema, do you find the XARRAY interface intuitive and easy to use?

[^1]: This is inhereted from MSv2 that only allows a single source per field [https://casacore.github.io/casacore-notes/229.pdf, p35], though a source can appear in more than one field.

Environment instructions

It is recommended to use the conda environment manager to create a clean, self-contained runtime where xradio and all its dependencies can be installed:

conda create --name xradio python=3.11 --no-default-packages
conda activate xradio

Clone the repository, checkout the review branch and do a local install:

git clone https://github.com/casangi/xradio.git
git checkout 162-create-combined-field-source-and-ephemeris-dataset
cd xradio
pip install -e .
taktsutsumi commented 1 month ago

Comment on the questions 1.1 and 1.2.1: There is a non-imaging usecase of ephemeris data, flux calibration. The extra columns listed in the CASA Ephemeris Data are used by setjy in the current CASA for the usecase, to determine flux density of a solar system object used as a primary flux calibrator. These columns were specifically requested by Brian Buttler but in his implementation (https://open-bitbucket.nrao.edu/projects/CASA/repos/casa6/browse/casatasks/src/private/solar_system_setjy.py), but I think some of the data columns are not used in the code. Since these additional data will be only needed for a subset of the solar system objects for flux calibration, these can be optional. The ALMA Memo, https://library.nrao.edu/public/memos/alma/main/memo594.pdf, has a formal description of the flux calculation.

tnakazato commented 1 month ago

A couple of comments mainly from SD perspective.

ps['MSv4_name'].VISIBILITY.field_and_source_xds

Just in case, is it ps['MSv4_name'].SPECTRUM.field_and_source_xds for single-dish?

Regarding 1.1)

Is sky_dir_label an arbitrary string rather than fixed to ['ra', 'dec']? For example, in the Galactic coordinate, labels are ['l', 'b'].

For FIELD_REFERENCE_CENTER, there could be usecases that requires multiple reference positions. For example, when we cannot find a good reference field with similar elevation value to target field, we could use two reference positions with similar azimuth and upper/lower elevations to interpolate reference spectra into target position. I'm not sure it's feasible. But I remember it was discussed in the context of ALMA and/or NRO 45m although it was never implemented.

Another usecase that could not be supported by FIELD_REFERENCE_CENTER is so-called "horizontal reference". Because elevation difference between target and reference can cause degradation of calibrated spectra, ALMA sometimes tries to take reference data at the same elevation with target field. Since this cannot be done by the fixed position in celestial coordinate (neither absolute position nor relative position from target field), reference field consequently moves with time. It seems the horizontal reference is a default for extra-Galactic TP observation in ALMA.

Yet another comment which is not SD specific. Regarding FIELD_HASE_CENTER_OFFSET etc. for ephemeris case, are they constant over time? I'm not an expert so I'm just asking if the assumption is OK. In Az/El mount, field of view rotates with time. So, I'm wondering if offset could also rotate...

Misc.

Personally FIELD_REFERENCE_CENTER is good idea from technical point of view because it describes an association between reference and target explicitly and robustly. But I'm afraid that this is logically justified. That is because usually reference fields specifies "void" region so it is totally unrelated to the target field.

Jan-Willem commented 1 month ago

Thank you @taktsutsumi and @tnakazato for your initial comments.

Changes in response to comments:

Please do the following to get all the latest changes:

taktsutsumi commented 4 weeks ago

Few more comments:

1.2.2 VS_CREATE, VS_DATE, and VS_VERSION are just for information, I believe (although Measures checks for existence of these keywords, I don't think it is used in the code), so they are not needed. VS_TYPE - used stored type of different Measure tables so this is not need (and field_and_source_xds_type attribute essentially provide the same info.)

There is one keyword recently added in getephemtable is ‘ephemeris_source’ indicating name of ephemeris data source, such as DE200, DE441, etc. Knowing the origin of the ephemeris data may be helpful information. Note that JPL-Horizons query will use latest (e.g. DE440/DE441) but astropy 6.1’s default ephemerides is DE430.

MJD0, dMJD, earliest, latest - I think these are there in keywords for quick access of the data without accessing time column in the Table system. Since these can be determined from the data, I don’t think these are needed.

radii - used in flux model meanrad - can be determined from radii rot_per - the rotation period information may be needed to create de-rotated image? orb_per - probably not needed?

1.5 SOURCE_RADIAL_VELOCITY's unit can be km/s (JPL-Horizons gives this value in AU/day but when usually it is converted to km/s (or m/s)

1.11 SUB_OBSERVER_DIRECTION → SUB_OBSERVER_POSITION, I believe, this is a position on the surface of the target so planetodetic_location seems to be more appropriate.

SUB_SOLAR_POSITION, similar this is lon, lat of the Sun on the target so planetodetic_location

I think quantity is fine for other variables and yes I think NORTH_POLE_POSITION_ANGLE and NORTH_POLE_ANGULAR_DISTANCE can be in a single data variable.

kgolap commented 4 weeks ago

I'll start by asking why are we having this table. If I understand well the MSV4 contains one field one spw. So this table is not necessary for single field observation or pointed mosaics as each point in the mosaic will be in it own dataset. So phase_centre and delay_centre etc can be meta data or coordinates of

For OTF or near field observation (i.e the correlator is continuously resetting the phase center at a rate that is at most the integration time. then this xds will have many rows (can be as many as the number of integration). Should it go into the main xds as columns ?

About setjy use case. For calibrator xds should not this table have a concept of model. i.e source shape, flux shape with frequency etc... right now this is encoded in setjy code. The MODEL column of SOURCE in MS v2 was supposed to be that but it was never really used except for virtual model column.

=== Some doubts or mistakes found. ) Description of delay_center and phase_center are flipped ) What is DOPPLER_SHIFT_VELOCITY for ? Are we doppler tracking (i.e removing the observatory velocity w.r.t the frame of the spw definition) when this value is assigned. What is its relationship with LINE_SYSTEMIC_VELOCITY ?
Isn't systemic velocity a source based parameter (just like SOURCE_PROPER_MOTION)...i.e the global radial velocity w.r.t LSRK of the source for e.g . Not clear what systemic velocity for every line means.

Jan-Willem commented 3 weeks ago

In the schema, I have added a time axis to FIELD_PHASE/DELAY_CENTER to support OTF.

@taktsutsumi questions and comments:

  1. 1.2.2 _VS_CREATE, VS_DATE, and VS_VERSION are just for information, I believe (although Measures checks for existence of these keywords, I don't think it is used in the code), so they are not needed. VS_TYPE - used stored type of different Measure tables so this is not need (and field_and_source_xdstype attribute essentially provide the same info.) Thanks, we will not include those then.
  2. _There is one keyword recently added in getephemtable is ‘ephemerissource’ indicating name of ephemeris data source, such as DE200, DE441, etc. Knowing the origin of the ephemeris data may be helpful information. Note that JPL-Horizons query will use latest (e.g. DE440/DE441) but astropy 6.1’s default ephemerides is DE430. I have added ephemeris_name as an attribute to record this.
  3. MJD0, dMJD, earliest, latest - I think these are there in keywords for quick access of the data without accessing time column in the Table system. Since these can be determined from the data, I don’t think these are needed. Thanks, we will not include those then.
  4. _radii - used in flux model meanrad - can be determined from radii rot_per - the rotation period information may be needed to create de-rotated image? orbper - probably not needed? What do the multiple radii values mean?
  5. _1.5 SOURCE_RADIALVELOCITY's unit can be km/s (JPL-Horizons gives this value in AU/day but when usually it is converted to km/s (or m/s) I will convert everything to SI units.
  6. _1.11 SUB_OBSERVER_DIRECTION → SUB_OBSERVER_POSITION, I believe, this is a position on the surface of the target so planetodetic_location seems to be more appropriate. SUB_SOLAR_POSITION, similar this is lon, lat of the Sun on the target so planetodetic_location. I think quantity is fine for other variables and yes I think NORTH_POLE_POSITION_ANGLE and NORTH_POLE_ANGULARDISTANCE can be in a single data variable. The problem is that only lat and lon are recorded which then according to the schema gets a spherical_dir_label. This allows us to differentiate with a measures that has a distance component (spherical_pos_label).

@kgolap questions and comments:

  1. _I'll start by asking why are we having this table. If I understand well the MSV4 contains one field one spw. So this table is not necessary for single field observation or pointed mosaics as each point in the mosaic will be in it own dataset. So phase_centre and delaycentre etc can be meta data or coordinates of

    • We choose to record the data in an Xarray.Dataset so that the access pattern would be the same for all use cases (including OTF): ps[ms_v4_name].VISIBILITY.attrs[‘field_and_source_xds’]
    • It is still metadata because it is stored in the attributes of the VISIBILITY data variable.
    • The xarray dataset is a lightweight structure that is closely related to a Python dictionary (see https://docs.xarray.dev/en/stable/generated/xarray.DataArray.to_dict.html).
  2. For OTF or near field observation (i.e the correlator is continuously resetting the phase center at a rate that is at most the integration time. then this xds will have many rows (can be as many as the number of integration). Should it go into the main xds as columns?

    • The advantage of storing the field, and ephemeris information in the attributes of the VISIBILITY data is that it allows us to support things such as phase shifting or changing the ephemeris: ps[ms_v4_name].VISIBILITY.attrs[‘field_and_source_xds’] ps[ms_v4_name].VISIBILITY_ROT.attrs[‘field_and_source_xds’] ps[ms_v4_name].VISIBILITY_NEW_EPHEMERIS.attrs[‘field_and_source_xds’]
    • The visibility data variables bookkeeping is handled using main_xds.attrs[‘data_groups’]: { 'base':{'visibility':'VISIBILITY','flag':'FLAG','weight':'WEIGHT','uvw':'UVW'}, 'imaging':{'visibility':'VISIBILITY_ROT','flag':'FLAG', 'weight':'WEIGHT_IMA GING','uvw':'UVW_ROT'} ‘test’:{'visibility':'VISIBILITY_NEW_EPHEMERIS','flag':'FLAG', 'weight':'WEIGHT_IMAGING',
      'uvw':'UVW_NEW_EPHEMERIS'} }
    • By using data groups we can have an arbitrary number of visibility, flag, weight, and uvw data variables. Data groups can share data variables among themselves (for example base, imaging, and test share weight and flag data variables).
    • There is no storage advantage of keeping the field or ephemeris data in the main_xds since the data variables (columns in MSv2 speak) are stored separately so we are not constrained to have them in the main_xds.
  3. About setjy use case. For calibrator xds should not this table have a concept of model. i.e source shape, flux shape with frequency etc... right now this is encoded in setjy code. The MODEL column of SOURCE in MS v2 was supposed to be that but it was never really used except for virtual model column.

    • I agree this would be a good idea. Currently, we only have support for adding a model as ps[ms_v4_name].VISIBILITY_MODEL and not as ps[ms_v4_name].VISIBILITY.attrs[‘field_and_source_xds’].MODEL_XXX. What data variables would we need to add?
    • We could leave this as a future optional addition but make a note of it?

    “Some doubts or mistakes found”:

  4. “Description of delay_center and phase_center are flipped:” Yes, thank you for finding that.

  5. _What is DOPPLER_SHIFT_VELOCITY for ? Are we doppler tracking (i.e removing the observatory velocity w.r.t the frame of the spw definition) when this value is assigned. What is its relationship with LINE_SYSTEMIC_VELOCITY ? Isn't systemic velocity a source based parameter (just like SOURCE_PROPERMOTION)...i.e the global radial velocity w.r.t LSRK of the source for e.g . Not clear what systemic velocity for every line means.

    • DOPPLER_SHIFT_VELOCITY is the DOPPLER::VELDEF from MS v2 (see pages 23-24, 32 of https://casacore.github.io/casacore-notes/229.pdf). Do you have suggestions for a better name and description?
    • LINE_SYSTEMIC_VELOCITY is the SOURCE::SYSVEL from MS v2 (see pages 44) where it is indexed by by source_id and has dimensions equal to the number of lines and is described as “Systemic velocity for each transition.” Do you have suggestions for a better name and description?