aiidateam / team-compass

A repository for storing the AiiDA team roadmap
https://team-compass.readthedocs.io
MIT License
0 stars 0 forks source link

Usability: Allow multiple calculations to be run as a single scheduler job #2

Open mbercx opened 1 year ago

mbercx commented 1 year ago

Motivation

Currently, the AiiDA engine will submit one job to the scheduler for each calculation job. The ability to run multiple AiiDA jobs inside one scheduler job has several use cases:

  1. In computing centers that do not allow node-sharing between scheduler jobs, this allows AiiDA users to run multiple calculation jobs that only partially use a node in parallel.
  2. Some schedulers might struggle to deal with managing a lot of jobs simultaneously, and hence running many small jobs in high-throughput can become problematic.
  3. Packaging multiple AiiDA jobs into one scheduler job means that the AiiDA jobs will only have to queue once.
  4. The scheduler configuration might restrict the number of active jobs at one time.

Desired Outcome

Have at least straightforward approach to pack multiple AiiDA jobs in one scheduler job, which is well documented and easy to find.

Impact

Any user that cannot efficiently use a full node on their computing center will benefit from use case [1], and we've already had several users request this feature for this reason.

Avoiding queueing times (use case [3]) is beneficial to pretty much all users, especially if they are running workflows with many short steps.

Use cases [2] and [4] are especially important to users that run many workflows in high-throughput.

Complexity

Most current approaches to implementing task farming rely on using a meta-scheduler (see Progress below). This requires implementing a new AiiDA scheduler, which depending on the meta scheduler is a matter of a few days work. Since we already have several such implementations, the main work that is left is to properly test/documents these and make sure users can find them by pointing to them from the main AiiDA documentation.

Background

This issue was originally raised in the 2020 AiiDA hackathon in Bologna. Also see https://github.com/orgs/aiidateam/discussions/5112 for a more recent discussion on the topic.

The main gist of these conversations is that we want to allow task farming through the use of a suitable meta-scheduler.

Progress

There are already two existing scheduler implementations for dealing with task farming:

Both approaches can in principle deal with all use cases presented in the Motivation section.

chrisjsewell commented 1 year ago

@mbercx also for the title here, can you make it proactive, e.g. something like

Usability: Allow multiple calculations to be run as a single scheduler job

(FYI I just added this to the AEP: https://github.com/chrisjsewell/AEP/commit/fb793f6adb67ba681277f9542167fd9e5787ca3a)

giovannipizzi commented 1 year ago

As a comment: FirecREST is also planning to become a high-throughput scheduler. We should be in touch with them to make sure our usecases are going to be well covered.

One comment: @chrisjsewell on the format, that I realise now: Should we add also a further section (in general to all issues) "Actionable times" at the bottom of each issue, with checkboxes? Now I have to read the whole text to discover that

the main work that is left is to properly test/documents these and make sure users can find them by pointing to them from the main AiiDA documentation.

I would add something as:

Actionable items to close this roadmap item

~(I think some of these points will be actually related to another roadmap item that I have to open soon, I will link as soon as it's open)~ see also issue #8

chrisjsewell commented 1 year ago

Should we add also a further section (in general to all issues) "Actionable times" at the bottom of each issue, with checkboxes?

Yeh I think it can certainly be encouraged. Although I would stress that the key focus of these roadmap items is the "why" and not the "how", i.e. there doesn't have to be an exact plan on how to close a roadmap item, before it's opened; we just know that it's something that we definitely want to address

giovannipizzi commented 1 year ago

OK I see. Still, it's important to clarify what is the minimal list of things to do to consider it done, otherwise many will always remain open even if they are solved at 97%. Does not need to say how, but at least what minimal list of "issues" (in a general sense, not in the GitHub sense) must be fixed to consider this done. This also avoid that we keep adding requirements to a roadmap item - better to close one and then open another more advanced one.

BTW, actually creating one, I think what I suggest should be inside the "Progress".