NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.99k stars 331 forks source link

Convert non-kernel cuda files to cpp #1322

Closed ksivaman closed 2 weeks ago

ksivaman commented 2 weeks ago

Description

There are cuda files in the paddle and pytorch extensions that neither define nor call any kernels. This PR changes them to pure c++ files.

Type of change

Changes

Checklist:

ksivaman commented 2 weeks ago

/te-ci

ksivaman commented 2 weeks ago

Seems reasonable once we fix the compliation error. Is there a particular bug that this is fixing?

No this isn't a bug fix. I also didn't see noticeable performance improvement in compile time, but just for better practice.