cbmi-group / DeepETPicker

GNU General Public License v3.0
22 stars 2 forks source link

working with slurm cluster #8

Open ntoia opened 2 weeks ago

ntoia commented 2 weeks ago

Hello,

We have installed DeepETPicker onto our SLURM cluster. I have attempted testing the program and have come across a problem when running the training. It seems that the program is unable to detect the GPUs on the cluster. Is this yet to be compatible for use on a cluster or am I missing a step in the procedure?

Traceback (most recent call last):

File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner

self.run() File "/usr/lib/python3.8/threading.py", line 870, in run

self._target(*self._args, **self._kwargs) File "/DeepETPicker/train.py", line 352, in train_func

runner = Trainer(min_epochs=min(50, args.max_epoch), File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars

return fn(self, **kwargs) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 345, in init

self.accelerator_connector.on_trainer_init( File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 101, in on_trainer_init

self.trainer.data_parallel_device_ids = device_parser.parse_gpu_ids(self.trainer.gpus) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/utilities/device_parser.py", line 78, in parse_gpu_ids

gpus = _sanitize_gpu_ids(gpus) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/utilities/device_parser.py", line 139, in _sanitize_gpu_ids

raise MisconfigurationException(f""" pytorch_lightning.utilities.exceptions MisconfigurationException :

            You requested GPUs: [1]
            But your machine only has: []

Thanks in advance!

lgl603 commented 2 weeks ago

I'm very sorry that I haven't used SLURM cluster. At present, our software only supports Linux or CentOS server system with Nvidia GPU.

The above error indicates that there is no GPU driver installed in your slurm cluster. You can search how to install a driver compatible with your GPU graphics card on the slurm cluster.