Open ntoia opened 2 weeks ago
I'm very sorry that I haven't used SLURM cluster. At present, our software only supports Linux or CentOS server system with Nvidia GPU.
The above error indicates that there is no GPU driver installed in your slurm cluster. You can search how to install a driver compatible with your GPU graphics card on the slurm cluster.
Hello,
We have installed DeepETPicker onto our SLURM cluster. I have attempted testing the program and have come across a problem when running the training. It seems that the program is unable to detect the GPUs on the cluster. Is this yet to be compatible for use on a cluster or am I missing a step in the procedure?
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run() File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs) File "/DeepETPicker/train.py", line 352, in train_func
runner = Trainer(min_epochs=min(50, args.max_epoch), File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars
return fn(self, **kwargs) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 345, in init
self.accelerator_connector.on_trainer_init( File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 101, in on_trainer_init
self.trainer.data_parallel_device_ids = device_parser.parse_gpu_ids(self.trainer.gpus) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/utilities/device_parser.py", line 78, in parse_gpu_ids
gpus = _sanitize_gpu_ids(gpus) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/utilities/device_parser.py", line 139, in _sanitize_gpu_ids
raise MisconfigurationException(f""" pytorch_lightning.utilities.exceptions MisconfigurationException :
Thanks in advance!