Q: Manual partitioning - Githubissues

dougbinks / enkiTS

A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

zlib License

1.74k stars 145 forks source link

Q: Manual partitioning #80

Closed ib00 closed 10 months ago

ib00 commented 1 year ago

enkiTS automatically partitions workload into ranges.

What is a recommended way to manually partition workload and submit these tasks to the scheduler? So, I want to create a list of tasks with manually specified (start, end) range.

There are two applications:

user defined task splitting
creating tasks over 2D/3D domain

Any suggestions are greatly appreciated.

dougbinks commented 1 year ago

creating tasks over 2D/3D domain

This is possible with the current system (I use it a lot).

For example if you have a 2D array of size X_SIZE, Y_SIZE then you create a task with m_SetSize = X_SIZE*Y_SIZE and then in the range loop iterate using a loop such as:

// warning untested code
for( uint32_t i = range_.start; i < range_.end; ++i )
{
    y = i / X_SIZE;
    x = i - y*X_SIZE;
}

If you want your 2D array to be processed in blocks of NxM then you would divide the X_SIZE and Y_SIZE by these before creating the range and then process each block in your own loop inside the range loop.

user defined task splitting

Could you give an example of what you want to achieve? This can usually be done by some combination of setting m_SetSize and m_MinRange.

ib00 commented 1 year ago

Thanks!

I have two applications:

Bucket rendering where I partition image into NxM blocks, but I don't want to process blocks in order. I am using a space-filling curve, so (start,end) blocks are in nonlinear order.
Better load-balancing. I want to create jobs of different sizes. I partition the workload first in large chunks and then progressively reduce the chunk size. For the particular workload I have, this achieves much better load balancing.

dougbinks commented 1 year ago

Bucket rendering where I partition image into NxM blocks, but I don't want to process blocks in order. I am using a space-filling curve, so (start,end) blocks are in nonlinear order.

I also do this regularly. Using Morton order as an example I use Decode the Morton code into X,Y. Hilbert curves can also be converted from the code to X,Y. So use the provided range as the code and the appropriate conversion to get X,Y. Hopefully this makes sense but if not I can provide example code.

Better load-balancing. I want to create jobs of different sizes. I partition the workload first in large chunks and then progressively reduce the chunk size. For the particular workload I have, this achieves much better load balancing.

If your workload has unevenly distributed computational costs there are a couple of approaches you could use with the current scheduler:

Create several tasksets which split up your domain in different sized chunks. You could use priorities to ensure the larger tasks are scheduled first (using dependencies could leave the CPU idle whilst the last few large chunks complete).
Alternatively map the range to a table of chunks with each being a different size. With this approach it could difficult to ensure large chunks are completed before the smaller ones.

Although I have considered user provided partitioning schemes this isn't simple with a lightweight scheduler like enkiTS.

ib00 commented 1 year ago

Thanks for your suggestions. I like enkiTS exactly because it's lightweight.

What I had in mind was if there was a way for user to create a bunch of tasks manually and then submit them in a batch.

dougbinks commented 1 year ago

What I had in mind was if there was a way for user to create a bunch of tasks manually and then submit them in a batch.

The best way to do this is to create the bunch of tasks in a TaskSet, or create them in an array and launch a TaskSet to launch them. You can also create them as a dependency of an empty task, launch that which completes and the rest will then be launched, but this is equivalent to launching them from a task.

ib00 commented 1 year ago

Perfect! With this and the 2D remapping trick you mentioned before, I can do what I need.