Open kerrmudgeon opened 4 years ago
May I expect any update on this? std::cout in device code would be a sweet feature to facilitate debugging.
Is there any interest in a sweet feature like this?
Note, std::format
or similar would be a perfectly viable alternative. Even a non-standard solution to enable seemingly atomic printing would be welcome.
Thanks for any consideration.
Definitely interest from another user here for this sweet feature 😄
It's on our roadmap, we just need to prioritize it. ;) The more people that ask, the easier it would be for us to bump it into the next couple releases.
I think std::format
is going to be much more feasible than std::cout
.
@wmaxey I third this. Most C++ beginners prefer using std::cout
and std::cerr
to std::format
IMHO. The latter is kinda Python-like (or is it the other way around?)
I also sincerely hope that std::cout
can be used in a CUDA kernel function.
Hijacking this issue to be about format
and std::print
now. cout
isn't going to be feasible any time soon.
Would be great if libcudacxx offers a solution for device-side logging that this request would easily enable, instead of each team rolling out its own solution.
We’re interested in this too, as we prefer logging over issuing __trap
when something wrong but not fatal is detected in a kernel.
In Spiral, I've created my own printing functionality that is typesafe and can print arbitrary type in the language. It also uses a global semaphore to ensure that only one thread is sending data to the terminal at the time. Furthermore, the functions can be used interchangeably on both Python and the Cuda side.
The difficulty I am having is that there is no way to do IO redirection on the Python side. It only ever shows the data in the terminal. In fact, I asked about this on the Cuda dev support page, and this is the reply that I got by Yuki Ni.
Here is the answer and suggestion from our python engineering team , CUDA does not offer a way to redirect stdout from the device side. printf works as is. There has been sporadic conversations with the CCCL team on adding device-side logging support, please voice up there to gain attention: https://github.com/NVIDIA/cccl/issues/939
If a status report (e.g. progress bar) is needed to happen periodically from the device side, I believe an alternative approach is to do an in-kernel atomic-write to host pinned memory, and have an independent host thread polling the value written by the device. Then, the host code has full control over logging & I/O stream redirection.
More than just sending text, I wish Cuda had support for channels so that we could send arbitrary data to the host without the need to terminate the kernel.
printf() is available in CUDA, but it has several deficiencies.
This request is to add a similar printing facility as
std::ostream
that is accessible in device code. This should enable user-defined printing functions and provide type safety. Ideally, delimiter tokens analogous tostd::flush
andstd::endl
would enable the CUDA driver to interleave the output from CUDA threads without corruption.Example usage: