ComputationalRadiationPhysics / redGrapes

Resource-based, Declarative task-Graphs for Parallel, Event-driven Scheduling :grapes:
https://redgrapes.rtfd.io
Mozilla Public License 2.0
20 stars 5 forks source link

Release Feedback from the SYCL Community #18

Open ax3l opened 4 years ago

ax3l commented 4 years ago

Hi everyone,

we got some feedback from core SYCL developers over on Twitter on our release. They are super interested in our work cases and would like feedback on what one could improve in SYCL task graphs to have similar control.

I gave some examples but would love you to take a look so I did not forget anything important.

Full thread: https://twitter.com/axccl/status/1206318925270532097

Branch 1: https://twitter.com/illuhad/status/1206336202204438530 Branch 2: https://twitter.com/illuhad/status/1206338789301456896 Branch 2b: https://twitter.com/axccl/status/1206347291738529792 Branch 3: https://twitter.com/codeandrew/status/1206349235928649728

Quick summary on SYCL tasks:

Side note: I realized we do not have an MPI example in the repo / readthedocs yet. Can we please add a good one?

There is also a question on other related works: how this compares to Legion's task concept:

michaelsippel commented 4 years ago

OK so here my thoughts:

Both SYCL and Legion solve similar problems as RedGrapes, but have a much broader scope and very specific execution models. RedGrapes is just about tasks and nothing more.

Regarding Legion: RedGrapes is much simpler, mainly because it's node-local. But distributed scheduling could be easily built on top of it. Legion looks very complicated.

Side note: I realized we do not have an MPI example in the repo / readthedocs yet. Can we please add a good one?

The MPI abstractions are currently in pmacc only, but I will make an minimal example how mpi can be used in the next days. Basically, in a task we create a mpi-request and then register an event which delays the removal of the vertex from the graph. The event gets notified from a polling loop. Such a mechanism is required, because waiting inside the task for the request to finish creates deadlocks. This is a problem that arises because a receive operation will not finish, before its corresponding send is created. Because send & receive may operate on separate buffers first, they are not dependent, but with non-preemtive tasks it can create deadlocks, like described in the attached slides. May be a bit technical, but shows that asynchronous communication is not trivial. Even if we can run mpi calls inside a SYCL-host-kernel, we would need some like the above described mechanism to handle the asynchronous aspect. And SYCL is specialized for OpenCL kernels, whereas RedGrapes uses the same mechanism for MPI as for CUDA or whatever asynchronous operation is needed.

async1 async2

ax3l commented 4 years ago

Thanks for your thoughts! I guess I conveyed the essence of that today. MInor correction: SYCL is the single-source programming model SYCL and not identical to OpenCL. ;)

ax3l commented 4 years ago

Found another one for literature research: taskflow https://github.com/taskflow/taskflow (arxiv)

michaelsippel commented 4 years ago

Thanks for reporting! This one was already in the list, but I didn't notice they renamed it.

redGrapes Comparison Table (working branch)

ax3l commented 4 years ago

Oh right, thanks! Yes, just the latest release carried the rename.

ax3l commented 4 years ago

not sure if relevant for the comparison table, but just came across: https://github.com/sci-visus/BabelFlow

michaelsippel commented 4 years ago

not sure if relevant for the comparison table, but just came across: https://github.com/sci-visus/BabelFlow

Hm, they do something with tasks, but I don't get what they are doing. Where is the DSL as claimed in the readme ? Regarding the status of the documentation, a comparison with this project is currently not very useful I think.