dask / dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE
https://jobqueue.dask.org
BSD 3-Clause "New" or "Revised" License
235 stars 142 forks source link

Extend OARCluster implementation to let OAR take into account the memory parameter #595

Closed ychiat35 closed 2 years ago

ychiat35 commented 2 years ago

Following the issue #594 , we propose an extension of the existing OARCluster implementation to let OAR take into account the memory parameter.

The OAR scheduler does not deal with memory internally. Indeed, by default it is not possible to indicate to OAR to reserve a specific amount if memory on the wanted computing resources (e.g., one core with 256 GB memory). However, it is possible to leverage from OAR resource properties to ensure that the wanted resources have at least the wanted amount of memory.

Since the OAR property names are not standardized by OAR, their names might differ from one cluster to another. Consequently, we introduce a new parameter in OARCluster class: oar_mem_core_property_name. It lets users specify the name of the memory property of their own OAR cluster. This property will be used by adding a new #OAR -p line to the OAR submission. If the parameter is not used or set to None, our modification does not modify the current behavior of the OARCluster class, but users will be warned that the memory parameter will not be taken into account by OAR.

ychiat35 commented 2 years ago

Thanks a lot for the your quick review! Updates following your comments are just committed. About your "worker_memory/worker_cores calculation" question, you can find above my explanations. Hope that it's clear for you and do not hesitate if you have other comments :)

lesteve commented 2 years ago

A bit late to the party, sorry! Here are the few comments I had, this would be very welcome if they could be tackled in a further PR:

More questions (not related to this PR in particular but to OAR support more generally):

ychiat35 commented 2 years ago

Thanks @lesteve for your comments! Please find my commit in #598

By the way, the OAR clusters that I use are Grid5000 and igrida :)

lesteve commented 2 years ago

About the case where there is already a property defined, the unit test exists normally.

Yep I missed it somehow, thanks!