ATNF / yandasoft

Astronomical Calibration and Imaging Software
Other
16 stars 8 forks source link

Why does Selavy on Singularity use large amount of memory? #13

Closed prlahur closed 3 years ago

prlahur commented 4 years ago

Selavy running parallel in Singularity uses large amount of memory. Investigate the cause. Is it because of MPI or Selavy itself or other things? Link to internal JIRA ticket: https://jira.csiro.au/browse/AXA-626

prlahur commented 4 years ago

As the first step, use dataset from issue https://github.com/ATNF/yandasoft/issues/12, run on Galaxy and record the memory usage. Ping @davepallot and @jmarvil

prlahur commented 4 years ago

Ran Selavy on Singularity on Galaxy. SLURM output gave this:

INFO  analysis.StatReporter (9, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 63 MB
INFO  analysis.StatReporter (5, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 395 MB  PeakRSS: 64 MB
INFO  analysis.StatReporter (7, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 63 MB
INFO  analysis.StatReporter (1, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 63 MB
INFO  analysis.StatReporter (5, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 8.07  system: 0.18  real: 8.47
INFO  analysis.StatReporter (6, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 64 MB
INFO  analysis.StatReporter (7, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 7.96  system: 0.27  real: 8.47
INFO  analysis.StatReporter (8, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 63 MB
INFO  analysis.StatReporter (8, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 8  system: 0.2  real: 8.47
INFO  analysis.StatReporter (9, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 7.89  system: 0.34  real: 8.47
INFO  analysis.StatReporter (0, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 486 MB  PeakRSS: 72 MB
INFO  analysis.StatReporter (1, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 7.97  system: 0.26  real: 8.47
INFO  analysis.StatReporter (2, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 64 MB
INFO  analysis.StatReporter (2, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 7.94  system: 0.28  real: 8.47
INFO  analysis.StatReporter (4, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 64 MB
INFO  analysis.StatReporter (4, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 8.01  system: 0.2  real: 8.47
INFO  analysis.StatReporter (6, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 8.04  system: 0.21  real: 8.47
INFO  analysis.askapparallel (9, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.StatReporter (0, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 7.5  system: 0.44  real: 8.47
INFO  analysis.askapparallel (1, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.StatReporter (3, nid00204) [2020-08-17 21:36:22,201] - Memory stats - PeakVM: 394 MB  PeakRSS: 63 MB
INFO  analysis.askapparallel (5, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.askapparallel (7, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.StatReporter (3, nid00204) [2020-08-17 21:36:22,201] - Total times  - user: 7.9  system: 0.28  real: 8.47
INFO  analysis.askapparallel (6, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.askapparallel (8, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.askapparallel (4, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.askapparallel (2, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.askapparallel (3, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI
INFO  analysis.askapparallel (0, nid00204) [2020-08-17 21:36:22,201] - Exiting MPI

Ran sacct and got this report:

> sacct --format="CPUTime,MaxRSS,NTasks,MaxVMSize"
   CPUTime     MaxRSS   NTasks  MaxVMSize 
---------- ---------- -------- ---------- 
  00:10:00                                
  00:05:00      1996K        1    211484K 
  00:10:00         4K        1         4K 
  00:02:00      2036K       10    280136K 
prlahur commented 3 years ago

Closing this old ticket as version 1.1 has already been released. Will reopen as needed