TUM-DAML / seml

SEML: Slurm Experiment Management Library
Other
168 stars 30 forks source link

More informative subprocess error #77

Closed michalk8 closed 2 years ago

michalk8 commented 2 years ago

Expected Behavior

When some (SLURM) config params are wrong (e.g. being over max allowed time), I'd expect to get a nice error message.

Actual Behavior

Non-informative error

Steps to Reproduce the Problem

  1. create invalid config (my case was time: 3-00:00, wheras maximum allowed time for 1 job is 2 days, not 3
  2. seml ... start

Error message:

Traceback (most recent call last):
  File "/home/icb/marius.lange/.miniconda3/envs/nsode/bin/seml", line 8, in <module>
    sys.exit(main())
  File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/main.py", line 279, in main
    f(**vars(command))
  File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/start.py", line 801, in start_experiments
    add_to_slurm_queue(collection=collection, exps_list=staged_experiments, unobserved=unobserved,
  File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/start.py", line 564, in add_to_slurm_queue
    start_sbatch_job(collection, exp_array, unobserved,
  File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/site-packages/seml/start.py", line 229, in start_sbatch_job
    output = subprocess.run(f'sbatch {path}', shell=True, check=True, capture_output=True).stdout
  File "/home/icb/marius.lange/.miniconda3/envs/nsode/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'sbatch /tmp/1ab7cf9b-23a4-4f28-a1ba-99271cda7b60.sh' returned non-zero exit status 1.

Specifications

Details - Version: 0.3.5 - Python version: 3.8.12 - Platform: CentOS Linux release 7.9.2009 - Anaconda environment (`conda list`):
danielzuegner commented 2 years ago

Thanks for the suggestion. Should be fixed by bad38e5.

n-gao commented 2 years ago

PR #70 would also have fixed that :D

danielzuegner commented 2 years ago

Dammit, I knew there was something similar. I looked through open and closed issues but didn't think of PRs.

michalk8 commented 2 years ago

PR #70 would also have fixed that :D Dammit, I knew there was something similar. I looked through open and closed issues but didn't think of PRs.

Sorry, should've look through open PRs before as well; thanks a lot for such quick responses!