Open thekswenson opened 1 year ago
Are you running on a cluster? The "used more disk than requested" message is a warning, not an error. But if your cluster (maybe LSF) is very strict, this could indeed get your job killed. Likewise using more memory than requested could be a problem on any cluster including SLURM.
In any case, it does look like your job is getting killed (signaled SIGKILL
), and that usually happens due to lack of memory. If you're not on a cluster, you will simply need to find a system with more memory. If you are on a cluster, the disk/memory requirements should be fixed in an upcoming release (we are presently testing on SLURM).
In all cases, I don't expect any of the --defaultCores 30 --defaultDisk 5G --maxCores 32 --maxMemory 27G --realTimeLogging
flags to do anything.
Hmmm... I'm running it on my desktop. I have 32G on this machine and am not experiencing any of the usual symptoms associated to lack of memory before the program crashes. That is, my swap does not fill up, and there are no paging/thrashing issues as far as i can tell. I just tried to keep my eye on htop and, while I didn't stare at it the whole time, it looks like it peaked at 22G.
Is there anything else it could be?
NOTE: I had to add maxCores=32
due to another bug (that is reported here, but I can't find it at the moment). I'll remove the other flags.
Is there a way to be sure that it is a memory issue? I just kept one eye on the process, and it got SIGKILL after ~30 minutes of running, and I had >15G of memory free at the time.
Not sure what else it can be. You can export CACTUS_LOG_MEMORY=1
to track some of the memory usage. You can also verify your ulimit -a
.
Thanks for the tip. After setting export CACTUS_LOG_MEMORY=1
, where will I find the log?
I'm running into issues in the cactus_consolidated step.
I ran catus with the flags:
--defaultCores 30 --defaultDisk 5G --maxCores 32 --maxMemory 27G --realTimeLogging
.The
--defaultDisk 5G
parameter was added because of, but did not take care of the following "Job used more disk than requested" error: