NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.85k stars 310 forks source link

Make transformer_engine::getenv arguments independent of C++ ABI version #896

Closed ksivaman closed 3 months ago

ksivaman commented 3 months ago

Description

This is a cleaner workaround for undefined symbol errors for transformer_engine::getenv in the PyTorch CUDAExtension for which previously system.cpp from the common lib was being included as a source file in the framework build. We still need to be mindful when using getenv in the framework extensions henceforth with T as string or filesystem::path since this would also be the return type and would lead to the same errors.

Type of change

Changes

Changes transformer_engine::getenv to use const char* arg instead of std::string argument.

Checklist:

ksivaman commented 3 months ago

/te-ci

ptrendx commented 3 months ago

Hmmm, TBH I don't understand why we don't just put that getenv template in the header file instead of the cpp file and just use that in the extensions rather than relying on the precompiled version.

timmoon10 commented 3 months ago

Hmmm, TBH I don't understand why we don't just put that getenv template in the header file instead of the cpp file and just use that in the extensions rather than relying on the precompiled version.

This is not a bad idea either. This function is small, so explicit template instantiation doesn't save us much in terms of compilation time or binary size: https://github.com/NVIDIA/TransformerEngine/blob/0edf30b87159e82048b5f248e4b379aebb8f364a/transformer_engine/common/util/system.cpp#L58-L70