benchopt / benchopt

Making your benchmark of optimization algorithms simple and open
https://benchopt.github.io
BSD 3-Clause "New" or "Revised" License
245 stars 62 forks source link

ENH failure to detect that no gpu is installed #395

Closed tomMoral closed 2 years ago

tomMoral commented 2 years ago

Trying to install the benchmark_lasso, I ran into this issue:

tom@marky:prog#benchopt/benchmarks/lasso(ENH_overhead_glmnet)
$ benchopt install -e --minimal
Installing 'lasso' requirements
Traceback (most recent call last):
  File "/home/tom/.local/miniconda/bin/benchopt", line 33, in <module>
    sys.exit(load_entry_point('benchopt', 'console_scripts', 'benchopt')())
  File "/home/tom/.local/miniconda/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/tom/.local/miniconda/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/tom/.local/miniconda/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/tom/.local/miniconda/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/tom/.local/miniconda/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/tom/Work/prog/benchopt/benchopt/cli/main.py", line 349, in install
    benchmark.validate_solver_patterns(solver_names)
  File "/home/tom/Work/prog/benchopt/benchopt/benchmark.py", line 102, in validate_solver_patterns
    all_solvers = _list_all_parametrized_names(*self.get_solvers())
  File "/home/tom/Work/prog/benchopt/benchopt/benchmark.py", line 92, in get_solvers
    return self._list_benchmark_classes(BaseSolver)
  File "/home/tom/Work/prog/benchopt/benchopt/benchmark.py", line 146, in _list_benchmark_classes
    cls = _load_class_from_module(
  File "/home/tom/Work/prog/benchopt/benchopt/utils/dynamic_modules.py", line 60, in _load_class_from_module
    module = _get_module_from_file(module_filename, benchmark_dir)
  File "/home/tom/Work/prog/benchopt/benchopt/utils/dynamic_modules.py", line 33, in _get_module_from_file
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/tom/Work/prog/benchopt/benchmarks/lasso/solvers/cuml.py", line 4, in <module>
    cuda_version = get_cuda_version()
  File "/home/tom/Work/prog/benchopt/benchopt/utils/sys_info.py", line 33, in get_cuda_version
    out = subprocess.check_output(command).strip().decode("utf-8")
  File "/home/tom/.local/miniconda/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/tom/.local/miniconda/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['nvidia-smi', '-q', '-x']' returned non-zero exit status 9.

Here, nvidia-smi is installed but I have not GPu on my computer (yeah I know this is a weird one...) We should make sure to fail graciously when the call to nvidia-smi fails.

ping @tanglef

tanglef commented 2 years ago

I've never seen this case... So question: is the output of nvidia-smi -q -x | grep attached_gpus a 0 between the 2 tags ? If so, I'll use this in get_cuda_version. Or in this case even if which nvidia-smi works, you can't call nvidia-smi ?

tomMoral commented 2 years ago

Its the second option:

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$ which nvidia-smi
/usr/bin/nvidia-smi
tanglef commented 2 years ago

This should do it :slightly_smiling_face: