Closed FedeMPouzols closed 1 month ago
After the VLASS OTF PR, several ALMASD datasets produce an error in extract_source_info()
: ValueError('different number of dimensions on data and dims: 3 vs 2')
SOURCE/DIRECTION
is normally as a [1, 2]
size array, which uses only one dimension for the coordinates when loaded into an xds.
But in some datasets it seems to be transposed, it is given as a [2, 1]
size array. That produces an additional dimension in the DIRECTION variable: (DIRECTION (SOURCE_ID, TIME, SPECTRAL_WINDOW_ID, dim_1, dim_2))
which after the selection isel(TIME=0, SPECTRAL_WINDOW_ID=0, drop=True)
in extract_source_info()
remains as:
DIRECTION (SOURCE_ID, dim_1, dim_2)
That is the issue. This branch now has a fix for that, which drops the unexpected and 1-sized dimension if it is present.
Another amusing point is that these example SOURCE subtables (sdimaging_flagtest.ms, selection_intent.ms, selection_misc.m, etc.) also have the column PULSAR_ID, set to 0.
All these example MSs are produced by the CASA simulator, which is probably the source of the issue. Similar failures must happen in the other groups of test MSs (ALMA, VLA, Other, etc.) but those are currently masked behind more common errors that trigger some of the early assert in extract_source_info()
.
After the last (second) commit the common AssertionError('Can only process source table with a single time entry for a source_id and spectral_window_id.')
issue seems fixed. I think we should still improve these asserts to turn them into exceptions, and the check could be more strict and ensure that for every source_id time is unique (perhaps with a loop of selections of individual unique source_id).
The count of errors is down from ~66 to ~8, at least for now.
With the latest commits (which also brings via main some of 168-review-ms_xdsattrsantenna_xds-schema-and-xradio-interface), the issues in ALMASD and EVLA datasets seem all fixed. We are down to:
VLA: 1 failure (some dimensionality mismatch)
ALMA: 5 failures (2 problems with SOURCE_ID and 2 with EPHEMERIS_ID (+ crazySourceTable.ms
which is probably an acceptable failure))
Others: 2 failures related to dimension sizes.
After the last commits above this comment the remaining issues seem to be a handful of specific MSs:
With partition_scheme=["FiELD_ID"]
:
Without "FIELD_ID" partitioning, we have the same 3 as above and:
After the last few commits I see only one failure left, with crazySourceTable.ms
, which produces a legitimate: "Can only process source table with a single time entry for a source_id and spectral_window_id." (see reasons in the issue description).
So far the following issues:
SOURCE_DIRECTION
, in extract_source_info, double-check MSsFIELD_PHASE_CENTER
(could simply be an indentation issue in the 'if is_single_dish', double-check)FIELD_PHASE_CENTER
comes back:No variable named 'FIELD_PHASE_CENTER'. Variables on the dataset include ['FIELD_REFERENCE_CENTER', 'time', 'field_name', 'SOURCE_POSITION', 'SOURCE_RADIAL_VELOCITY', 'OBSERVATION_POSITION', 'SUB_OBSERVER_POSITION', 'ellipsoid_pos_label', 'sky_pos_label']"
.- [ ] ALMA: sun.subset.pentagon.ms ===> problem withFIELD_PHASE_CENTER
tdem0003gencal.ms, refim_Cband.G37line.ms, evla-highres-sample-thinned.ms, CAS-5172-phase-center.ms, etc. ===>=> this is now turned into an issue in tdem0003gencal.ms and CAS-5172-phase-center.ms. NUM_LINES = 1 => REST_FREQUENCY and SYSVEL are populated with an array of two values, which seems to add an unexpected dimension.KeyError: 'transition'
, looks like we have to handle missing optional columns in source table in a safer way, double-checktransition
AssertionError('Can only process source table with a single time entry for a source_id and spectral_window_id.'),
- 5 in ALMASD, 2 in EVLA, 18 in Others, 37 in VLA, 4 in ALMA.'Dataset' object has no attribute 'SPECTRAL_WINDOW_ID'
increate_ant_xds()
after merge of #204.A few remaining issues being investigated (20240725), with
partition_scheme=["FIELD_ID"]
:Without "FIELD_ID" partitioning, we have the same 3 as above and: