eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs
BSD 3-Clause "New" or "Revised" License
752 stars 79 forks source link

Support CUDA execution graphs #175

Closed neoblizz closed 1 month ago

neoblizz commented 4 years ago

Added with CUDA 10.0, cuda graphs (though really terribly named as there are 7 "graph" things that cuda provides), is a take on task graphs within CUDA's programming model.

What's really cool about cuda graphs is the performance benefits you get when relying on task graphs to set things up for repeated use. Examples that benefit from this are machine learning (surprise, surprise), or any iterative-converging algorithm.

I will add a plan-of-attack soon to this issue for review.

Related issue #77.

eyalroz commented 4 years ago

Note that we can't (or at least, should not) offer merely imperative "add node" and "add edge" etc. It should be possible to pass graphs. or iterators over the structure of the graph, from common graph representation libraries (not specific ones, something generic and template based), and have that graph be fed in.

And even if we don't make it that far already in the beginning, we must at least lay the foundation for that to be possible later on.

neoblizz commented 4 years ago

An update, I still have this in my todo list. I plan to work in near future after some important deliverables are taken care of.

eyalroz commented 4 years ago

@neoblizz : Sure, whenever you have the time.

I should perhaps qualify what I said earlier: We can start with the simple imperative wrappers, then move up into the more abstract iterating-over-graphs version later.

eyalroz commented 1 year ago

I'm thinking of starting work on this myself.

eyalroz commented 1 year ago

So, I now have an untested, but compiling, initial version of wrappers for the graph functionality, on a new branch - see the link below.

It's not quite complete, especially w.r.t. "executable" graphs' methods for getting and setting node parameters; but some graph template methods are missing too.

I would appreciate feedback. @codecircuit - maybe you might also be interested in checking this out?

eyalroz commented 1 year ago

@neoblizz : So, I'm almost done. I mean, the graph support is working, but I have a couple of data structures which currently have ugly names; and node creation API could stand some beautification. But - feedback would be very much appreciated.

neoblizz commented 1 year ago

But - feedback would be very much appreciated.

This is awesome! Thanks for sharing, I'll take a look.

eyalroz commented 1 year ago

Maybe start with the examples:

Also - praise is appreciated :-) ... but concrete feedback about what you liked more, and liked less, would be the most useful.

neoblizz commented 1 year ago

I don't have great feedback here, the graph::create(), insert.edge/node() interface makes complete sense. And I actually really like using the lambda as the node idea, it feels very intuitive. In the example, it will also be nice to have the destruction of an existing node.

eyalroz commented 1 year ago

@neoblizz : You're right, in the sense that my examples don't have good enough coverage of the API. Problem is, it's enough of an effort to maintain the wrappers, update them for new CUDA versions and catch up with the new features - and I just don't have the time to add more examples or have proper unit test coverage. Those are the kinds of limitations you have when developing FOSS in your spare time... :-(

neoblizz commented 1 year ago

Either way, the coverage is enough to get started with some decent use of CUDA graphs + I really like the lambda stuff. :)

eyalroz commented 1 year ago

Note that there's an issue with C++20 and spans, and there's a PR for it:

https://github.com/eyalroz/cuda-api-wrappers/pull/478

Also, if you do write something with it - even if it's just a toy project - drop me a line, so I can have a look. Maybe it'll give me some ideas for some more tweaks or convenience features.