comses-education / chime-abm

CHIME Hurricane Model with FAIR+OSG enhancements
GNU General Public License v3.0
0 stars 0 forks source link

support for OSG #1

Open alee opened 2 years ago

alee commented 2 years ago
alee commented 2 years ago

behaviorspace experiment should be split into multiple pieces instead of one enormous parameter sweep

https://support.opensciencegrid.org/support/solutions/articles/5000632058-computation-on-the-open-science-pool

alee commented 2 years ago

getting an error for disk usage despite requesting:

# Job requirements - make sure we're running on a Singularity enabled node with enough resources to execute our code
Requirements = HAS_SINGULARITY == True && OSG_HOST_KERNEL_VERSION >= 31000
request_cpus = 2
request_memory = 16 GB
request_disk = 50 GB

Error log:

007 (22905708.000.000) 2022-07-07 00:39:29 Shadow exception!
        Error from slot1_3@GP-ARGO-astate-backfill-a7a8bef66d28: disk usage exceeded request_disk
        0  -  Run Bytes Sent By Job
        1510  -  Run Bytes Received By Job

After requesting 500GB the job was still held:

...
012 (22909717.000.000) 2022-07-07 05:50:42 Job was held.
        Job in status 2 put on hold by SYSTEM_PERIODIC_HOLD due to disk usage 204445924.
        Code 26 Subcode 0