Open liuliuliu0605 opened 7 months ago
You may use either of the following solutions:
scaled_softmax_cuda
is contained in apex. You may install it from https://github.com/NVIDIA/apex .--no-masked-softmax-fusion
to avoid the use of fused kernel.You may use either of the following solutions:
- The library
scaled_softmax_cuda
is contained in apex. You may install it from https://github.com/NVIDIA/apex .- Add
--no-masked-softmax-fusion
to avoid the use of fused kernel.
Thank you for your reply. Solution 2 has fixed the problem. However, after I install apex, the ModuleNotFoundError problem still occurs. The installing command is as follows:
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
I use pip list|grep apex
to obtain apex with version 0.1 and find scaled_masked_softmax_cuda.cpython-310-x86_64-linux-gnu.so in the directory of torch2.0.0-cu118-cp310/lib/python3.10/site-packages. Do I fail to install scaled_softmax_cuda?
@liuliuliu0605
I installed apex in the same way. scaled_softmax_cuda
should also be included in apex.
@yuantailing Thanks for providing the details. I rember when I installed apex master branch but failed. The log is install.log. Can it be caused by incompatible cuda version ?
So I choose to install apex 22.04-dev branch, which actually does not include scaled_softmax_cuda.cu
file. Therefore, the module scaled_softmax_cuda
can not be found.
Marking as stale. No activity in 60 days.
Describe the bug When I try to run single GPU T5 Pretraining with the script
examples/pretrain_t5.sh
, it outputs the following error:It seems that the code lacks of module scaled_softmax_cuda or do I need to install the relevant python module ?
Stack trace/logs
Environment (please complete the following information):