NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.85k stars 309 forks source link

[PyTorch] `logging.basicConfig()` in global scope #1065

Closed Marks101 closed 1 month ago

Marks101 commented 2 months ago

Hello transformer-engine team,

889 introduced a logging mechanism, which we really appreciate. But now there are multiple calls to logging.basicConfig() in the global/import scope (here). I think it makes more sense to call logging.basicConfig() on an application level inside a main function and not in a library in global scope.

For example in our training code we call logging.basicConfig() in the main function and this now overrides the settings done by transformer-engine. This means that the setup based on the environmental variables NVTE_DEBUG and NVTE_DEBUG_LEVEL gets overriden as well. We had to explicitly suppress the logging with logging.getLogger("DotProductAttention").setLevel(logging.WARNING) in our application code because of the huge amount of log messages by transformer-engine.

A possible solution is to remove the logging.basicConfig() from global scope and instead set the log level of each individual logger based on the corresponding environment variables.

Happy to discuss about this 😄

cyanguwa commented 2 months ago

Hi @Marks101 , could you please try PR 1074 see if it achieves the expected effect in your application? Thanks.

cyanguwa commented 1 month ago

See PR #1074.