ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
11 stars 9 forks source link

Priorities can overflow for plausible values of MT #56

Open abouteiller opened 2 years ago

abouteiller commented 2 years ago

Describe the bug

Priorities can overflow for plausible values of MT (e.g., running with M=500k, MB=180 results in MT >1500, which would overflow the priorities an cause scheduling to try to run in the 'wrong' order).

Additional context

https://github.com/ICLDisco/dplasma/pull/53#discussion_r893839754

Possible fixes

  1. priorities could be 'float' type, but that requires a change we do not really like on the PaRSEC side.
  2. We need to somehow clamp the priorities back into an int32 at the end, but we can use any intermediate representation to avoid overflow in the meantime. Question is how we do that without crushing all priorities for the tail-end of the execution?
  3. We need to define priority classes that are somewhat independent of MT^3, maybe having independent task priorities for each individual GEMM is overkill, and having priority per-panel/update iteration is enough
omor1 commented 2 years ago

Is it possible to use a 64-bit priority instead of 32-bit? That allows MT up to 2–2.6M, which should be more than sufficient. That still requires annoying changes on the PaRSEC side, but maybe not as invasive as for a floating-point priority.

bosilca commented 2 years ago

It is always possible, but the task_s is actually quite packed and tightly aligned. Increasing the size of the priority field to an int64_t will increase the size of the parsec_task_s struct by 8 bytes, and push a lot of things into the next cache line.

That being said, we had a parallel discussion about improving the chores mask to be able to limit the migration opportunities for a task. We should be able to combine these two ideas and use the extra space provided by increasing the size of the priority field for the chose_mask field (which is currently an uint8_t and could be transitioned to an uint32_t).

abouteiller commented 3 days ago

icldisco/parsec#706