Open mbercx opened 1 year ago
@mbercx also for the title here, can you make it proactive, e.g. something like
Usability: Allow multiple calculations to be run as a single scheduler job
(FYI I just added this to the AEP: https://github.com/chrisjsewell/AEP/commit/fb793f6adb67ba681277f9542167fd9e5787ca3a)
As a comment: FirecREST is also planning to become a high-throughput scheduler. We should be in touch with them to make sure our usecases are going to be well covered.
One comment: @chrisjsewell on the format, that I realise now: Should we add also a further section (in general to all issues) "Actionable times" at the bottom of each issue, with checkboxes? Now I have to read the whole text to discover that
the main work that is left is to properly test/documents these and make sure users can find them by pointing to them from the main AiiDA documentation.
I would add something as:
~(I think some of these points will be actually related to another roadmap item that I have to open soon, I will link as soon as it's open)~ see also issue #8
Should we add also a further section (in general to all issues) "Actionable times" at the bottom of each issue, with checkboxes?
Yeh I think it can certainly be encouraged. Although I would stress that the key focus of these roadmap items is the "why" and not the "how", i.e. there doesn't have to be an exact plan on how to close a roadmap item, before it's opened; we just know that it's something that we definitely want to address
OK I see. Still, it's important to clarify what is the minimal list of things to do to consider it done, otherwise many will always remain open even if they are solved at 97%. Does not need to say how, but at least what minimal list of "issues" (in a general sense, not in the GitHub sense) must be fixed to consider this done. This also avoid that we keep adding requirements to a roadmap item - better to close one and then open another more advanced one.
BTW, actually creating one, I think what I suggest should be inside the "Progress".
Motivation
Currently, the AiiDA engine will submit one job to the scheduler for each calculation job. The ability to run multiple AiiDA jobs inside one scheduler job has several use cases:
Desired Outcome
Have at least straightforward approach to pack multiple AiiDA jobs in one scheduler job, which is well documented and easy to find.
Impact
Any user that cannot efficiently use a full node on their computing center will benefit from use case [1], and we've already had several users request this feature for this reason.
Avoiding queueing times (use case [3]) is beneficial to pretty much all users, especially if they are running workflows with many short steps.
Use cases [2] and [4] are especially important to users that run many workflows in high-throughput.
Complexity
Most current approaches to implementing task farming rely on using a meta-scheduler (see Progress below). This requires implementing a new AiiDA scheduler, which depending on the meta scheduler is a matter of a few days work. Since we already have several such implementations, the main work that is left is to properly test/documents these and make sure users can find them by pointing to them from the main AiiDA documentation.
Background
This issue was originally raised in the 2020 AiiDA hackathon in Bologna. Also see https://github.com/orgs/aiidateam/discussions/5112 for a more recent discussion on the topic.
The main gist of these conversations is that we want to allow task farming through the use of a suitable meta-scheduler.
Progress
There are already two existing scheduler implementations for dealing with task farming:
aiida-fireworks-scheduler
using the workflow manager FireWorks.aiida-hyperqueue
: using the HyperQueue meta-scheduler.Both approaches can in principle deal with all use cases presented in the Motivation section.