Open alee opened 2 years ago
behaviorspace experiment should be split into multiple pieces instead of one enormous parameter sweep
getting an error for disk usage despite requesting:
# Job requirements - make sure we're running on a Singularity enabled node with enough resources to execute our code
Requirements = HAS_SINGULARITY == True && OSG_HOST_KERNEL_VERSION >= 31000
request_cpus = 2
request_memory = 16 GB
request_disk = 50 GB
Error log:
007 (22905708.000.000) 2022-07-07 00:39:29 Shadow exception!
Error from slot1_3@GP-ARGO-astate-backfill-a7a8bef66d28: disk usage exceeded request_disk
0 - Run Bytes Sent By Job
1510 - Run Bytes Received By Job
After requesting 500GB the job was still held:
...
012 (22909717.000.000) 2022-07-07 05:50:42 Job was held.
Job in status 2 put on hold by SYSTEM_PERIODIC_HOLD due to disk usage 204445924.
Code 26 Subcode 0