gem / oq-engine

OpenQuake Engine: a software for Seismic Hazard and Risk Analysis
https://github.com/gem/oq-engine/#openquake-engine
GNU Affero General Public License v3.0
378 stars 273 forks source link

Fix excessive memory allocation in classical.post_execute() #7759

Closed chrisbc closed 2 years ago

chrisbc commented 2 years ago

NZSHM large hazard jobs are failing with OOM errors. ...

full details / test results to come (new PR)

sample error...

[2022-04-27 00:41:24 #31 INFO] classical 100% [46 submitted, 0 queued]
[2022-04-27 00:41:24 #31 INFO] Mean time per core=313s, std=472.0s, min=137s, max=2108s
[2022-04-27 00:41:24 #31 INFO] Received {'rup_data': '318.24 MB', 'pmap': '1.99 MB', 'source_data': '1.13 MB', 'cfactor': '6.38 KB', 'grp_id': '230 B', 'task_no': '230 B'} in 2053 seconds from classical
[2022-04-27 00:41:24 #31 INFO] There are 9000 realization(s)
[2022-04-27 00:41:24 #31 INFO] cfactor = 1_799_756/133_442 = 13.5
[2022-04-27 00:41:24 #31 INFO] There were 1 slow task(s)
Killed

real    68m44.663s
user    41m11.717s
sys     0m59.123s
(openquake) chrisbc@tryharder-ubuntu:~/DEV/GNS/opensha-modular/GEM/oq-engine$ /usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 6 leaked semaphor
e objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
micheles commented 2 years ago

The essential thing is to send us an example calculation running out of memory, then I can help.