This is CASA eMERLIN pipeline to calibrate data from the e-MERLIN array. Please fork the repository before making any changes and read the Coding Practices page in the wiki. Please add issues with the pipeline in the issues tab.
For reasonably amount of memory available (64-128GB), we can improve aoflagger running time by a factor of about 5 just by processing each band individually. I will implement that mode in the pipeline by default.
Details
I have been doing tests because aoflagger is the time bottleneck for many pipeline runs. The problem is that sources with low number of visibilities (like bright calibrators, small datasets, etc), it runs on memory-read mode and it is very fast (usually a few minutes per source). For large datasets, the target usually has too many visibilities, aoflagger says there is not enough memory available ("Because this is not at least twice as much, direct read mode (slower!) will be used."), and the processing time get several hours (about 5h for example).
I have run different tests forcing different modes on a 64 GB memory machine for a 24h L band dataset (225G):
The visibility counts per source are:
ID Code Name RA Decl Epoch nRows
0 ACAL 1331+305 13:31:08.287300 +30.30.32.95900 J2000 957432
1 PCAL 2007+4029 20:07:44.944855 +40.29.48.60415 J2000 3516576
2 2032+4127 20:32:13.070000 +41.27.23.40000 J2000 8235360
3 CAL 0319+415 03:19:48.160110 +41.30.42.10330 J2000 906696
4 CAL 1407+284 14:07:00.394410 +28.27.14.68990 J2000 836304
I have tried different approaches:
(0) Standard: default mode (auto-read-mode, will select either memory or direct mode based on available memory), one execution per source using the option fields
(1) memory-read
(2) indirect-read
(3) direct-read
(4) default mode with individual bands using option bands
Results
Standard (default, per field):
4-5 min for small sources (1331+305, 0319+415, 1407+284)
32 min for medium sources (2007+4029, phase calibrator)
5h 3m for big sources (2032+4127, target). It jumps to direct mode.
memory-read (crashed, tried to load the whole dataset into memory)
on a single small source (0319+415) just crashes. It tries to load the whole dataset into memory and starts swapping. So it does not read only the specified source, but everything.
indirect-read (1h15m per field, just doing a small one)
It has to duplicate the data and sort everything. It took 1h15 min just sorting the whole dataset to process only 0319+415. Same problem as memory-read, it works with the whole dataset.
direct-read (Total: 5h52m)
Very similar to standard. Memory always below 6 GB, usually 3 GB needed, but it took 5h52m to complete. 4-5 min for small, 35 min for 2007+4029 and 5h for target.
Default per field and per band. (Total 1h15m)
Memory usage very low, below 4.5 GB all the time.
Small size fields: about 30 sec per band. Total of 4-5 min per field (as normal)
Medium size: 2m20s min per band --> 18m30s (nearly a factor 2 better than standard)
Big size: 5m30s per band --> 44 min!! (a factor x7 faster than standard)
Conclusions
The most efficient seems to be auto-read-mode running per field and per band. It seems to detect the real size of each selected block and selects memory mode automatically. What I don't understand is why it does not use memory in the end (always below 5GB), I suspect it just reads scan by scan.
Summary
For reasonably amount of memory available (64-128GB), we can improve aoflagger running time by a factor of about 5 just by processing each band individually. I will implement that mode in the pipeline by default.
Details
I have been doing tests because aoflagger is the time bottleneck for many pipeline runs. The problem is that sources with low number of visibilities (like bright calibrators, small datasets, etc), it runs on
memory-read
mode and it is very fast (usually a few minutes per source). For large datasets, the target usually has too many visibilities, aoflagger says there is not enough memory available ("Because this is not at least twice as much, direct read mode (slower!) will be used."), and the processing time get several hours (about 5h for example).I have run different tests forcing different modes on a 64 GB memory machine for a 24h L band dataset (225G):
The visibility counts per source are:
I have tried different approaches:
fields
bands
Results
Standard (default, per field):
memory-read (crashed, tried to load the whole dataset into memory) on a single small source (0319+415) just crashes. It tries to load the whole dataset into memory and starts swapping. So it does not read only the specified source, but everything.
indirect-read (1h15m per field, just doing a small one) It has to duplicate the data and sort everything. It took 1h15 min just sorting the whole dataset to process only 0319+415. Same problem as memory-read, it works with the whole dataset.
direct-read (Total: 5h52m) Very similar to standard. Memory always below 6 GB, usually 3 GB needed, but it took 5h52m to complete. 4-5 min for small, 35 min for 2007+4029 and 5h for target.
Default per field and per band. (Total 1h15m)
Conclusions
The most efficient seems to be auto-read-mode running per field and per band. It seems to detect the real size of each selected block and selects memory mode automatically. What I don't understand is why it does not use memory in the end (always below 5GB), I suspect it just reads scan by scan.
Usage during test4 (default per band):
Usage during test3, direct-read:
Usage during test1, memory read: