Closed Ciclarion closed 1 month ago
Most probably you encounter OoM error. Could you try changing threads
value in this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 to 1? It should take longer but use less memory.
Thanks for the quick answer. I tried to change, but sadly, still the same problem. However as i was monitoring the gpu usage with nvidia-smi; it didn't seems to grow at all during the building. (For information, I have one rtx3090). I'll try to see what else i can change.
Edit: After checking, the CMakeChache.txt which is created contain the line "CMAKE_CUDA_FLAGS:STRING= " with nothing. Don't now if it's normal
It was effectively an OOM error, and i had to change the MAX_NUM_WORK env for ninja build !
Hello,
My system : Ubuntu 24.04 Cuda 12.1 CuDNN 8.9.2 Python 3.10
I've quite a strange problem. When i'm trying to install TransformerEngine with pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable, my computer "crash" during the build_wheel step as my ubuntu session automatically close !
I also tried building from source, and it has the same behavior during the running setup.py
As it crashes, i've no error message so no idea what could be the problem....