MathMarEcol / pdyer_aus_bio

GNU General Public License v3.0
0 stars 0 forks source link

Master fails at `gfbootstrap_combined_tmp` #10

Closed PhDyellow closed 2 years ago

PhDyellow commented 2 years ago

I am attempting to figure out how much memory is needed by gfbootstrap_combined_tmp in #8. After setting the worker memory to 200GB, and setting clustermq.worker.timeout to 7 days, the worker seems to be running fine. However, now the master is being killed for going over 20GB.

I have checked that _targets.R uses storage = "worker" and retrieval = "worker", so master shouldn't be loading anything.

I will try setting master to use 200GB anyway, and see if it works.

PhDyellow commented 2 years ago

Giving BOTH the master and the worker 200GB of RAM successfully built gfbootstrap_combined_tmp.

The Internet suggests that sacct -s r --format=ALL might tell me how much memory was used at peak, but MaxRSS seems to be empty.

After killing the jobs with scancel the MaxRSS field was populated. It seems the master needed 55GB memory, and the worker needed 85GB of memory.

I suspect the increased memory demands in master may have something to do with the cache files, but I don't know.

I do know that gfbootstrap_combined_tmp is one of the few targets that pulls in ALL the gfbootstrap objects at once.

I have an estimate of memory consumption now.