Closed Jan-Willem closed 3 weeks ago
min_freq = min(ps.summary()['start_frequency'])
ps.sel(start_frequency=min_freq).summary()
Traceback:
...
File "/home/groot/casa/github/xradio/src/xradio/vis/_processing_set.py", line 208, in <lambda>
return df[df[col].apply(lambda x: any(i in x for i in input_strings))]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/groot/casa/github/xradio/src/xradio/vis/_processing_set.py", line 208, in <genexpr>
return df[df[col].apply(lambda x: any(i in x for i in input_strings))]
^^^^^^
TypeError: argument of type 'float' is not iterable
But it works with a slice:
ps.sel(start_frequency=slice(min_freq, min_freq)).summary()
ps.sel(name='ALMA_uid___A002_X1003af4_X75a3.split.avg_1').summary()['name']
0 ALMA_uid___A002_X1003af4_X75a3.split.avg_13
1 ALMA_uid___A002_X1003af4_X75a3.split.avg_19
2 ALMA_uid___A002_X1003af4_X75a3.split.avg_1
3 ALMA_uid___A002_X1003af4_X75a3.split.avg_14
4 ALMA_uid___A002_X1003af4_X75a3.split.avg_10
5 ALMA_uid___A002_X1003af4_X75a3.split.avg_17
6 ALMA_uid___A002_X1003af4_X75a3.split.avg_15
7 ALMA_uid___A002_X1003af4_X75a3.split.avg_11
8 ALMA_uid___A002_X1003af4_X75a3.split.avg_18
9 ALMA_uid___A002_X1003af4_X75a3.split.avg_16
10 ALMA_uid___A002_X1003af4_X75a3.split.avg_12
but could be useful in other ways:
ps.sel(obs_mode='OBSERVE_TARGET').summary()['obs_mode']
0 OBSERVE_TARGET#ON_SOURCE
1 OBSERVE_TARGET#OFF_SOURCE
etc. (75 matches)
But a similar field_name selection fails (there is no Sun_10_1, but based on the previous examples would have expected fields Sun_10_11, Sun_10_12, Sun_10_13, etc.):
ps.sel(field_name='Sun_10_1').summary()
Empty DataFrame
Not a selection issue:
ps = read_processing_set('ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/groot/casa/github/xradio/src/xradio/vis/read_processing_set.py", line 28, in read_processing_set
file_system, ms_store_list = _get_ms_stores_and_file_system(ps_store)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/groot/casa/github/xradio/src/xradio/_utils/zarr/common.py", line 39, in _get_ms_stores_and_file_system
raise (
TypeError: exceptions must derive from BaseException
Whereas convert_msv2 shows RuntimeError: Table ALMA_uid___A002_X1003af4_X75a3.split.avg.ms does not exist
partition_info['polarization_setup']
instead of polarization coordinate values (should be the same, right?) -- https://github.com/casangi/xradio/blob/213-fix-ps-selection/src/xradio/vis/_processing_set.py#L95Re: previous comment about Neptune_1 field name:
I uploaded the 2012 ALMA MS to uid_A002_X49990aX1f.ms to http://www.aoc.nrao.edu/~pford/uidA002_X49990a_X1f.ms.tar.gz (1.7G). I cannot find any duplicate field names, and the similar names (SPTxxxx) already have numbers appended (SPTxxxx-yy), probably to ensure uniqueness. There is only one "Neptune". The converter added an additional number, so the ps field names are:
ps.summary()['field_name'] 0 [J2253+161; 3c454.3_0] 1 [Neptune_1] 2 [J2357-5311_2] 3 [J0133-5200_16] 4 [J2253+161; 3c454.3_0] ...
337 [SPT0109-47_19] 338 [SPT0113-46_20] 339 [SPT0125-47_21] 340 [SPT0125-50_22] 341 [SPT0128-51_23]
In the latest commit:
sel in row_val
).The review_ps.ipynb has been updated.
Some of the stakeholder tests will fail because the schema has changed. Once we have agreed on the changes I will upload a new version of "s3://viper-test-data/Antennae_North.cal.lsrk.split.v5.vis.zarr" and "Antennae_North.cal.lsrk.split.vis.zarr".
@pford can you please check that the sel/queries you tested work and select a set of sel/query operations that we should add to the stakeholder tests?
I will look at ms_sel and ms_isel a bit more. These are not in the review notebook.
Summary fails if you ms select a single frequency. Then xds.frequency.values is a scalar not an array. (Selecting an spw with 1 frequency is an array and summary works.)
ps.sel(spw_name=ps.summary().spw_name[0]).ms_isel(frequency=0).summary() Traceback (most recent call last): File "
", line 1, in File "/home/groot/casa/github/xradio/src/xradio/vis/_processing_set.py", line 41, in summary self.meta["summary"][data_group] = self._summary(data_group).sort_values( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/groot/casa/github/xradio/src/xradio/vis/_processing_set.py", line 129, in _summary summary_data["start_frequency"].append(value["frequency"].values[0]) IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
Use case: to do another selection later on the returned ps.
ps.sel(spw_name=ps.summary().spw_name[0]).ms_isel(frequency=0).sel(field_name='Sun_10_29') (same IndexError)
@pford pushed a fix so that summary() works after ms_sel for a single frequency.
Review Instructions
Please review the MSv4 processing_set class https://github.com/casangi/xradio/blob/main/src/xradio/vis/_processing_set.py and go over the review tutorial reviews/review_ps.ipynb .
The processing set is a loose collection of MSv4 which might come from multiple MSv2 (or ASDMS). Consequently, arbitrary ids are avoided in favor of descriptive strings.
Run the notebook using:
partition_scheme=['FIELD_ID']
Default behavior. andpartition_scheme=[]
Used for VLASS rapid OTF (combines fields).Key Questions to Answer
1) Is there additional information to display in the summary table? 2) Are the docstrings sufficient? 3) Are there missing data selection use cases? 4) ...
Environment instructions
It is recommended to use the conda environment manager to create a clean, self-contained runtime where xradio and all its dependencies can be installed:
Clone the repository, checkout the review branch and do a local install:
On macOS it is required to pre-install python-casacore using
bash conda install -c conda-forge python-casacore
.