NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

Release GIL when calling C extensions #868

Closed szmigacz closed 2 weeks ago

szmigacz commented 1 month ago

Extensions that aren't calling CPython API should release GIL to allow multithreading (e.g. to monitor execution progress and recover from hangs or crashes). PyTorch pybind bindings are already releasing GIL if possible (py::gil_scoped_release).