Open PointKernel opened 2 months ago
Hmmm, this is surprising behavior to me.
It looks like we're defining _LIBCUDACXX_NO_EXCEPTIONS
by default.
There's a secondary issue that cuda::std::terminate
shouldn't exit with exit code 0.
Preferably trap is not used in device code: https://github.com/NVIDIA/cccl/issues/939#issuecomment-1802542072
Preferably trap is not used in device code: #939 (comment)
That's the only real option in device code that would otherwise throw an exception in host code.
We need an alternative for Python because __trap
corrupts the CUDA context and makes all subsequent CUDA operations fail. It is not a recoverable error and for long-running applications (ex: ML trainings, simulations, ...) or interactive sessions (ex: Jupyter notebooks) it's just too disruptive. For those (ex: Google) who have a sophisticated error mitigation, checkpointing, and/or fault tolerant stack in place, it might not be a big deal, but it's certainly not something available to general users and I would hope that libcudacxx takes this into account.
I would hope that libcudacxx takes this into account.
I am happy to get to a solution that works better for the python folks, but I would need some suggestions on how you believe exceptions could be implemented
Thanks, Michael! I don't believe there's any silver bullet in this, all we can do is to compromise. This is admittedly a particularly hard problem to libcudacxx, where there's no API that allows passing in an opaque library handle or descriptor that offers a natural definition of a library "context/scope;" as all C++ std library APIs are stateless functions. A scoped object would allow systematic deviations from the default behavior as well as offer a customization entry point.
As a user who want device-side error handling, I may want to do one or multiple of the following:
If we can allow in some way for users to pass a host pinned buffer and define a custom operator that can be fused into any device function, and libcudacxx writes to the buffer when a device-side error occurs and asks the user to use a host thread to monitor the buffer's value change, that'd be a good start. The trouble is I can't think of a generic and robust way of doing so without using a library handle mentioned earlier. I would hate to use a global config unless its visibility can be somehow localized to only one translation unit (again, a compromise).
I am working on getting assertions into libcu++.
Similar to what cub has we will also go in the direction of adding logging etc to the mix
Is this a duplicate?
Area
libcu++
Is your feature request related to a problem? Please describe.
Originally posted by @kroburg at https://github.com/NVIDIA/cuCollections/issues/589
cuda::stream_ref::wait
can throw exceptionsbut
throw
terminates normally with a return value of 0.CCCL disables exceptions by default, meaning that most APIs that are expected to throw exceptions never throw. One reason for this is that destructors should never throw exceptions (#683). Could we revisit this decision and consider enabling exceptions by default?
Describe the solution you'd like
Enable exceptions and find a proper way to "throw".
Describe alternatives you've considered
No response
Additional context
Not sure about the scope of this request, this could be a general CCCL feature request.