Closed tcompa closed 1 year ago
We'll need to test it to see what the actual usage is. Not sure I have a great intuition for how memory efficient e.g. our current illumination correction task actually is.
On the GPU side, we always get the full node at UZH, right? But at FMI, we only get what we request and other people can run things on the same node as well. We could start with somewhat lower defaults. And, if I understand correctly, we wouldn't hit a slurm error on the UZH side if we request 16 GB and use 20 GB of RAM, because the rest of the node is anyway free at that moment, right?
But at FMI, we only get what we request
(you do get the entire node memory, but at some point the cgroup out-of-memory handler will/may kill your slurm job, as in https://github.com/fractal-analytics-platform/fractal-server/issues/343)
And, if I understand correctly, we wouldn't hit a slurm error on the UZH side if we request 16 GB and use 20 GB of RAM, because the rest of the node is anyway free at that moment, right?
Agreed, although that's not something I would rely on, long-term. Once we test things a bit further, we should not request 16 if we know that 20 are needed.
napari-workflows: 1 cpu, 4G => decent start, may vary depending on workflow and ROIs
Any reason for not increasing the number of cpus a bit?
When running the CI (which uses some very small test datasets), the napari-workflow task quickly reaches 800% CPU usage with multithreading, after a few seconds:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16899 tommaso 20 0 3510400 783912 190976 S 793.4 4.9 0:30.04 /home/tommaso/.cache/pypoetry/virtualenvs/fractal-tasks-core-UoMDyr20-py3.10/bin/python /home/tommaso/.cache/pypoetry/virtualenvs/fractal-tasks-core-UoMDyr20-py3.10/bin/pytest tests/test_workflows_napari_workflows.py
Ah, thanks for the profiling! Yeah, then let's go 8 CPUs, 32G RAM for napari workflows by default :)
Let's spell out the cpu/memory/gpu requirements for all tasks. Here is a starting point:
Ref: