aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
433 stars 186 forks source link

The Slurm scheduler sets --ncpus-per-task to a float #4530

Closed pfebrer closed 3 years ago

pfebrer commented 3 years ago

Describe the bug

When submitting a job defining num_cores_per_machine, the scheduler is unable to submit the job because --ncpus-per-task is set to a float, which SLURM doesn't understand. The workchain gets then paused forever because it is unsuccesful at submitting the job.

Steps to reproduce

Using a slurm scheduler in a computer with mpiprocs_per_machine defined:

submit(CalculationClass, options={"resources": {"num_machines": x ,"num_cores_per_machine": y})

Your environment

Other relevant software versions, e.g. Postres & RabbitMQ

Additional context

I think it's related to this part of the code: https://github.com/aiidateam/aiida-core/blob/9ff07c166a559f98b5b2be71537814ec00d3f18d/aiida/schedulers/plugins/slurm.py#L133-L139

As I understand, this division should return an integer, but the opposite check is performed. I don't know, maybe I'm wrong.

chrisjsewell commented 3 years ago

thanks @pfebrer, I will close this by fixing the bug that you noted. But feel free to re-open if you still find an issue after