Based on recent experience, it is helpful for threads to be named. This uses pthread_setname_np to do that, via the already-existing but mostly-unused Al::internal::profiling::name_thread. It also sets the NCCL_SET_THREAD_NAME environment variable (unless it is already set) to get NCCL to do the same.
I also made our hang output nicer.
Some notes:
I have started using [[maybe_unused]] in some places when I poke at the code.
We do not need to define _GNU_SOURCE; it's a bit nasty, but g++ and clang++ will both set it automatically (libstdc++/libc++ will break without it).
For whatever reason, while there is a std::getenv, there is no std::setenv.
The hang watchdog doesn't use name_thread because it previously didn't include any Aluminum headers and I didn't want to add them.
Based on recent experience, it is helpful for threads to be named. This uses
pthread_setname_np
to do that, via the already-existing but mostly-unusedAl::internal::profiling::name_thread
. It also sets theNCCL_SET_THREAD_NAME
environment variable (unless it is already set) to get NCCL to do the same.I also made our hang output nicer.
Some notes:
[[maybe_unused]]
in some places when I poke at the code._GNU_SOURCE
; it's a bit nasty, but g++ and clang++ will both set it automatically (libstdc++/libc++ will break without it).std::getenv
, there is nostd::setenv
.name_thread
because it previously didn't include any Aluminum headers and I didn't want to add them.