In order to coordinate multiple streams effectively, while also trying to maximize parallel usage of the device, I have a scheduler which will add a callback to a stream so that other work, which was mutually exclusive with it, could be immediately scheduled.
This is fine, and working as needed; however, I'm wondering if the maintainers here have any ideas around a way to amortize the cost of having to box the callback each time (this happens a lot ... always ... and never stops until the system halts).
We are already in unsafe territory with most usage of the GPU anyway, so perhaps we could just pass along a pointer to an equivalent callable. That way the caller could take on the burden of ensuring the memory is not freed too early. Thoughts?
In order to coordinate multiple streams effectively, while also trying to maximize parallel usage of the device, I have a scheduler which will add a callback to a stream so that other work, which was mutually exclusive with it, could be immediately scheduled.
This is fine, and working as needed; however, I'm wondering if the maintainers here have any ideas around a way to amortize the cost of having to box the callback each time (this happens a lot ... always ... and never stops until the system halts).
We are already in unsafe territory with most usage of the GPU anyway, so perhaps we could just pass along a pointer to an equivalent callable. That way the caller could take on the burden of ensuring the memory is not freed too early. Thoughts?