aoflagger performance - Githubissues

Summary

For reasonably amount of memory available (64-128GB), we can improve aoflagger running time by a factor of about 5 just by processing each band individually. I will implement that mode in the pipeline by default.

Details

I have been doing tests because aoflagger is the time bottleneck for many pipeline runs. The problem is that sources with low number of visibilities (like bright calibrators, small datasets, etc), it runs on memory-read mode and it is very fast (usually a few minutes per source). For large datasets, the target usually has too many visibilities, aoflagger says there is not enough memory available ("Because this is not at least twice as much, direct read mode (slower!) will be used."), and the processing time get several hours (about 5h for example).

I have run different tests forcing different modes on a 64 GB memory machine for a 24h L band dataset (225G):

The visibility counts per source are:

  ID   Code Name                RA               Decl           Epoch        nRows
  0    ACAL 1331+305            13:31:08.287300 +30.30.32.95900 J2000       957432
  1    PCAL 2007+4029           20:07:44.944855 +40.29.48.60415 J2000      3516576
  2         2032+4127           20:32:13.070000 +41.27.23.40000 J2000      8235360
  3    CAL  0319+415            03:19:48.160110 +41.30.42.10330 J2000       906696
  4    CAL  1407+284            14:07:00.394410 +28.27.14.68990 J2000       836304

I have tried different approaches:

(0) Standard: default mode (auto-read-mode, will select either memory or direct mode based on available memory), one execution per source using the option fields
(1) memory-read
(2) indirect-read
(3) direct-read
(4) default mode with individual bands using option bands

Results

Standard (default, per field):
- 4-5 min for small sources (1331+305, 0319+415, 1407+284)
- 32 min for medium sources (2007+4029, phase calibrator)
- 5h 3m for big sources (2032+4127, target). It jumps to direct mode.
memory-read (crashed, tried to load the whole dataset into memory) on a single small source (0319+415) just crashes. It tries to load the whole dataset into memory and starts swapping. So it does not read only the specified source, but everything.
indirect-read (1h15m per field, just doing a small one) It has to duplicate the data and sort everything. It took 1h15 min just sorting the whole dataset to process only 0319+415. Same problem as memory-read, it works with the whole dataset.
direct-read (Total: 5h52m) Very similar to standard. Memory always below 6 GB, usually 3 GB needed, but it took 5h52m to complete. 4-5 min for small, 35 min for 2007+4029 and 5h for target.
Default per field and per band. (Total 1h15m)
- Memory usage very low, below 4.5 GB all the time.
- Small size fields: about 30 sec per band. Total of 4-5 min per field (as normal)
- Medium size: 2m20s min per band --> 18m30s (nearly a factor 2 better than standard)
- Big size: 5m30s per band --> 44 min!! (a factor x7 faster than standard)

Conclusions

The most efficient seems to be auto-read-mode running per field and per band. It seems to detect the real size of each selected block and selects memory mode automatically. What I don't understand is why it does not use memory in the end (always below 5GB), I suspect it just reads scan by scan.

Usage during test4 (default per band): usage_20180130_151715

Usage during test3, direct-read: usage_20180126_135620

Usage during test1, memory read: usage_20180130_134452

e-merlin / eMERLIN_CASA_pipeline

aoflagger performance #89

Summary

Details

Results

Conclusions