LLNL / Aluminum

High-performance, GPU-aware communication library
https://aluminum.readthedocs.io/en/latest/
Other
84 stars 21 forks source link

Name threads to help debugging #227

Closed ndryden closed 6 months ago

ndryden commented 6 months ago

Based on recent experience, it is helpful for threads to be named. This uses pthread_setname_np to do that, via the already-existing but mostly-unused Al::internal::profiling::name_thread. It also sets the NCCL_SET_THREAD_NAME environment variable (unless it is already set) to get NCCL to do the same.

I also made our hang output nicer.

Some notes: