Closed Jan-Willem closed 2 months ago
About the awful performance when converting large pointing tables (create_pointing_xds() ), I think it is now clear after some more tests that the vast lion's share of the inefficiencies comes from using the by-row reading functions of the casacore table tool.
From past experiences with the pointing table I had the vague idea that reading it through the generic read functions was easily slower by a factor of 2x as compared to the "readcol" functions normally used to read the main table / large columns. The generic and slower by-row reading is needed to deal with any potential complicated variable size array columns, which happen always or often in known subtables like SPECTRAL_WINDOW or HISTORY, and also in unexpected extension columns.
With a bit more of experimenting, replacing all the row-reading calls with by-column reads (getcol()s) ,the speedup factor can be as high as > ~20-30x, and probably increasing as the pointing table size increases. In particular the slicing and many calls to tablerow.py:48(get)) is clearly dominating runtimes and seems to trigger very large amounts of Python allocations (with memory use also being higher by a factor of >= ~3x).
Some examples of runtimes (all timing values approx.): | MS | MS size | POINTING size | "before (s)" (using row()) | "after (s)" (using getcol()) | create_pointing_xds() down to (s): |
---|---|---|---|---|---|---|
A) twhya.short.ms from casatestdata) | 550MB | 127MB | 461 | 22 | 0.96 | |
B) uid___A002_X8ca70c_X5_shortened.ms (from casatestdata) | 499MB | 211MB | 335 | 80 | 1.82 | |
C) uid___A002_Xfd764e_X4e4c_targets.ms" PL dataset, 7m mosaic | 2.2GB | 190MB | 8974 | 344 | 1.53 |
(These are datasets where only the "main" / science SPWs have been kept. The ratio would be higher for fuller (pre-calibration) datasets with many small SPWs, which is very common in ALMA at least throughout calibration.)
I'd be optimistic that this will probably sort out most if not all of the awful performance issue, and the other related topics listed in this issue can be addressed without being too conditioned by performance considerations.
We'll need a specialized efficient by-column reading for the POINTING subtable (and possibly others if they can be large enough) and the generic read for other tables to be able to handle misbehaving variable size columns, as well as non-standard extension columns that we can see in MSs from various observatories and are hard to anticipate. I'm leaning towards handling this alternative loading (by-row vs. by-col) deep in the code that reads the columns, keeping the rest of the code the same, but for that I have to reorganize the code still a bit more.
We do have one version of read_pointing that is fast (uses getcol()) for "read_vis()" but that one uses chunked reads with dask.delayed, which is not a good option for the convert functionality.
I forgot to attach an example profile back when this issue was analyzed. Here is one example cProfile of a conversion with pointing table for some of the examples in the table above: pointing_profiling.cprof.gz.
It shows very plain and direct how runtime is dominated by the time spend in those calls to tablerow
tool functions from read_generic_cols. The awful performance seen for pointing tables comes from using the by-row reading of the casacore table tool (indexing and slicing the dict of rows and the derived many calls to tablerow.py:48(get)). Reading by rows also seem to increase memory use by at least a factor of 2x when compared with reading by columns.
15122065 function calls (14753319 primitive calls) in 75.610 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
19/1 0.000 0.000 75.612 75.612 {built-in method builtins.exec}
1 0.003 0.003 75.612 75.612 <string>:1(<module>)
1 0.000 0.000 75.609 75.609 pointing_quick_checks.py:4(do_pointing_quick_check)
1 0.016 0.016 74.594 74.594 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/convert_msv2_to_processing_set.py:14(convert_msv2_to_processing_set)
48 0.005 0.000 74.564 1.553 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/conversion.py:246(convert_and_write_partition)
200 3.515 0.018 71.551 0.358 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/_tables/read.py:301(read_generic_table)
28 0.010 0.000 70.425 2.515 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/msv4_sub_xdss.py:212(create_pointing_xds)
200 0.022 0.000 66.148 0.331 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/_tables/read.py:382(read_generic_cols)
200 0.000 0.000 61.902 0.310 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:151(__getitem__)
200 2.012 0.010 61.902 0.310 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:65(_getitem)
3404817 59.655 0.000 59.655 0.000 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:48(get)
2409 3.504 0.001 3.504 0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/table.py:1011(getcol)
112 0.001 0.000 1.976 0.018 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xarray/core/dataset.py:1966(to_zarr)
112 0.002 0.000 1.976 0.018 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xarray/backends/api.py:1535(to_zarr)
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3404817 59.655 0.000 59.655 0.000 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:48(get)
200 3.515 0.018 71.551 0.358 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/_tables/read.py:301(read_generic_table)
2409 3.504 0.001 3.504 0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/table.py:1011(getcol)
200 2.012 0.010 61.902 0.310 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:65(_getitem)
728 0.481 0.001 0.573 0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xarray/core/variable.py:1770(_unstack_once)
21923 0.340 0.000 0.340 0.000 {built-in method numpy.array}
18663/16026 0.310 0.000 0.392 0.000 {built-in method numpy.asarray}
580 0.276 0.000 0.314 0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/table.py:315(__init__)
3554544 0.246 0.000 0.246 0.000 {method 'append' of 'list' objects}
1042144/1037172 0.139 0.000 0.224 0.000 {built-in method builtins.isinstance}
33483 0.110 0.000 0.110 0.000 {built-in method posix.stat}
1848 0.109 0.000 0.140 0.000 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/zarr/core.py:2431(_encode_chunk)
8710 0.103 0.000 0.106 0.000 {built-in method io.open}
Suggestions for Improvements:
Performance Issue: