ksivaman commented 3 months ago

Description

This is a cleaner workaround for undefined symbol errors for transformer_engine::getenv in the PyTorch CUDAExtension for which previously system.cpp from the common lib was being included as a source file in the framework build. We still need to be mindful when using getenv in the framework extensions henceforth with T as string or filesystem::path since this would also be the return type and would lead to the same errors.

Type of change

[ ] Documentation change (change only to the documentation, either a fix or a new content)
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Changes

Changes transformer_engine::getenv to use const char* arg instead of std::string argument.

Checklist:

[x] I have read and followed the contributing guidelines
[x] The functionality is complete
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes

ksivaman commented 3 months ago

/te-ci

ptrendx commented 3 months ago

Hmmm, TBH I don't understand why we don't just put that getenv template in the header file instead of the cpp file and just use that in the extensions rather than relying on the precompiled version.

timmoon10 commented 3 months ago

Hmmm, TBH I don't understand why we don't just put that getenv template in the header file instead of the cpp file and just use that in the extensions rather than relying on the precompiled version.

This is not a bad idea either. This function is small, so explicit template instantiation doesn't save us much in terms of compilation time or binary size: https://github.com/NVIDIA/TransformerEngine/blob/0edf30b87159e82048b5f248e4b379aebb8f364a/transformer_engine/common/util/system.cpp#L58-L70

NVIDIA / TransformerEngine

Make transformer_engine::getenv arguments independent of C++ ABI version #896

Description

Type of change

Changes

Checklist: