ICLDisco / parsec

PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
Other
50 stars 17 forks source link

Recursive tasks are scheduled on one queue #551

Open omor1 opened 1 year ago

omor1 commented 1 year ago

Description

Recursive tasks are all scheduled on the queue of the execution stream that executed the parent. This leads to suboptimal behavior in the case where

  1. The recursive task is high priority
  2. Queues on other execution streams are full

This leads to other execution streams never stealing the high-priority task, essentially serializing its execution and removing many benefits of executing it recursively. One example of this occurring is in POTRF tasks in HiCMA, where it appears that GEMM tasks are frequently scheduled before the recursive POTRF tasks, leading to global starvation once the GEMM tasks are complete. This is observable as oscillatory behavior in core utilization, even with reasonable levels of lookahead where there should always be many GEMM tasks available to execute.

The image below demonstrates the issue. This is max(idle_cores - queue_len, 0), where both idle_cores and queue_len are the global sum at each measurement point, approximately every millisecond. This demonstrates that cores are starving for work globally, which should never happen with Cholesky, and that they are being limited by the execution of the POTRF tasks on the diagonal, implying a priority inversion.

image

This naturally only affects schedulers with distributed queues; global schedulers such as AP are not affected. The above used the LLP scheduler.

Describe the solution you'd like

Several solutions come to mind:

Describe alternatives you've considered

There is prior research into load-balancing, work-stealing, and task priority; we can look into a solution from current literature. Marc pointed me towards a paper from Vivek Sarkar's group in Euro-Par 2015, "Load Balancing Prioritized Tasks via Work-Stealing", which I will be reading immanently.

omor1 commented 1 year ago

This should be assigned to @bosilca, per his request.