A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Extensions that aren't calling CPython API should release GIL to allow multithreading (e.g. to monitor execution progress and recover from hangs or crashes). PyTorch pybind bindings are already releasing GIL if possible (py::gil_scoped_release).
Extensions that aren't calling CPython API should release GIL to allow multithreading (e.g. to monitor execution progress and recover from hangs or crashes). PyTorch pybind bindings are already releasing GIL if possible (
py::gil_scoped_release
).