facebookincubator / submitit

Python 3.8+ toolbox for submitting jobs to Slurm
MIT License
1.3k stars 125 forks source link

timeout_min=0 results in pending jobs when a Slurm partition timelimit is set #1742

Closed ddangu525 closed 1 year ago

ddangu525 commented 1 year ago

Summary

When submitting jobs via submitit with timeout_min=0 to a Slurm partition that has a timelimit, the jobs indefinitely remain in a pending state. However, if timeout_min is not explicitly set, the jobs execute as expected based on the partition's timelimit.

Steps to Reproduce

  1. Create a Slurm partition with a timelimit (e.g., 1 hour).
  2. Set timeout_min=0 to executor
    
    import submitit

executor = submitit.AutoExecutor(folder="submitit_jobs") executor.update_parameters( slurm_partition="your_partition", # Set timelimit to 1 hour timeout_min=0 # Set to 0 )

def my_function(): print("Hello, world!")

job = executor.submit(my_function)



Expected Behavior
- The job should execute based on the Slurm partition's timelimit.

Actual Behavior
- The job remains in a pending state indefinitely.
---
When I don't set timeout_min, it works as it should. So, is this a bug, or is it supposed to work this way?