casangi / xradio

Xarray Radio Astronomy Data IO
Other
9 stars 5 forks source link

Pointing Dataset Creation Improvements #128

Closed Jan-Willem closed 2 months ago

Jan-Willem commented 6 months ago

Suggestions for Improvements:

Performance Issue:

FedeMPouzols commented 6 months ago

About the awful performance when converting large pointing tables (create_pointing_xds() ), I think it is now clear after some more tests that the vast lion's share of the inefficiencies comes from using the by-row reading functions of the casacore table tool.

From past experiences with the pointing table I had the vague idea that reading it through the generic read functions was easily slower by a factor of 2x as compared to the "readcol" functions normally used to read the main table / large columns. The generic and slower by-row reading is needed to deal with any potential complicated variable size array columns, which happen always or often in known subtables like SPECTRAL_WINDOW or HISTORY, and also in unexpected extension columns.

With a bit more of experimenting, replacing all the row-reading calls with by-column reads (getcol()s) ,the speedup factor can be as high as > ~20-30x, and probably increasing as the pointing table size increases. In particular the slicing and many calls to tablerow.py:48(get)) is clearly dominating runtimes and seems to trigger very large amounts of Python allocations (with memory use also being higher by a factor of >= ~3x).

Some examples of runtimes (all timing values approx.): MS MS size POINTING size "before (s)" (using row()) "after (s)" (using getcol()) create_pointing_xds() down to (s):
A) twhya.short.ms from casatestdata) 550MB 127MB 461 22 0.96
B) uid___A002_X8ca70c_X5_shortened.ms (from casatestdata) 499MB 211MB 335 80 1.82
C) uid___A002_Xfd764e_X4e4c_targets.ms" PL dataset, 7m mosaic 2.2GB 190MB 8974 344 1.53

(These are datasets where only the "main" / science SPWs have been kept. The ratio would be higher for fuller (pre-calibration) datasets with many small SPWs, which is very common in ALMA at least throughout calibration.)

I'd be optimistic that this will probably sort out most if not all of the awful performance issue, and the other related topics listed in this issue can be addressed without being too conditioned by performance considerations.

We'll need a specialized efficient by-column reading for the POINTING subtable (and possibly others if they can be large enough) and the generic read for other tables to be able to handle misbehaving variable size columns, as well as non-standard extension columns that we can see in MSs from various observatories and are hard to anticipate. I'm leaning towards handling this alternative loading (by-row vs. by-col) deep in the code that reads the columns, keeping the rest of the code the same, but for that I have to reorganize the code still a bit more.

We do have one version of read_pointing that is fast (uses getcol()) for "read_vis()" but that one uses chunked reads with dask.delayed, which is not a good option for the convert functionality.

FedeMPouzols commented 2 months ago

I forgot to attach an example profile back when this issue was analyzed. Here is one example cProfile of a conversion with pointing table for some of the examples in the table above: pointing_profiling.cprof.gz.

It shows very plain and direct how runtime is dominated by the time spend in those calls to tablerow tool functions from read_generic_cols. The awful performance seen for pointing tables comes from using the by-row reading of the casacore table tool (indexing and slicing the dict of rows and the derived many calls to tablerow.py:48(get)). Reading by rows also seem to increase memory use by at least a factor of 2x when compared with reading by columns.

         15122065 function calls (14753319 primitive calls) in 75.610 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     19/1    0.000    0.000   75.612   75.612 {built-in method builtins.exec}
        1    0.003    0.003   75.612   75.612 <string>:1(<module>)
        1    0.000    0.000   75.609   75.609 pointing_quick_checks.py:4(do_pointing_quick_check)
        1    0.016    0.016   74.594   74.594 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/convert_msv2_to_processing_set.py:14(convert_msv2_to_processing_set)
       48    0.005    0.000   74.564    1.553 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/conversion.py:246(convert_and_write_partition)
      200    3.515    0.018   71.551    0.358 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/_tables/read.py:301(read_generic_table)
       28    0.010    0.000   70.425    2.515 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/msv4_sub_xdss.py:212(create_pointing_xds)
      200    0.022    0.000   66.148    0.331 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/_tables/read.py:382(read_generic_cols)
      200    0.000    0.000   61.902    0.310 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:151(__getitem__)
      200    2.012    0.010   61.902    0.310 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:65(_getitem)
  3404817   59.655    0.000   59.655    0.000 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:48(get)
     2409    3.504    0.001    3.504    0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/table.py:1011(getcol)
      112    0.001    0.000    1.976    0.018 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xarray/core/dataset.py:1966(to_zarr)
      112    0.002    0.000    1.976    0.018 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xarray/backends/api.py:1535(to_zarr)
   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  3404817   59.655    0.000   59.655    0.000 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:48(get)
      200    3.515    0.018   71.551    0.358 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xradio/vis/_vis_utils/_ms/_tables/read.py:301(read_generic_table)
     2409    3.504    0.001    3.504    0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/table.py:1011(getcol)
      200    2.012    0.010   61.902    0.310 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/tablerow.py:65(_getitem)
      728    0.481    0.001    0.573    0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/xarray/core/variable.py:1770(_unstack_once)
    21923    0.340    0.000    0.340    0.000 {built-in method numpy.array}
18663/16026    0.310    0.000    0.392    0.000 {built-in method numpy.asarray}
      580    0.276    0.000    0.314    0.001 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/casacore/tables/table.py:315(__init__)
  3554544    0.246    0.000    0.246    0.000 {method 'append' of 'list' objects}
1042144/1037172    0.139    0.000    0.224    0.000 {built-in method builtins.isinstance}
    33483    0.110    0.000    0.110    0.000 {built-in method posix.stat}
     1848    0.109    0.000    0.140    0.000 /home/fedemp/ws_xradio_pointing/venv_xradio_python_38/lib/python3.8/site-packages/zarr/core.py:2431(_encode_chunk)
     8710    0.103    0.000    0.106    0.000 {built-in method io.open}