Closed MrPeterJin closed 7 months ago
its a verison mismatch problem for apex. maybe your apex version is too old or too new. you can first disable enable_layernorm_kernel
arg to run the code
its a verison mismatch problem for apex. maybe your apex version is too old or too new. you can first disable
enable_layernorm_kernel
arg to run the code
When I disabled the layernorm kernel, the code runs fine for me. However, I have conducted a reinstallation of OpenDiT according to the version recommended in your README file and this error log still exists. Is there any other possible reasons?
sorry no clues. i suppose it should be about your enviroment and apex.
sorry no clues. i suppose it should be about your enviroment and apex.
Then may I have a reference for your environment settings?(e.g. torch version, CUDA, etc.), since your requirements.txt does not restricting this... I suspect the new version of PyTorch may have something changed to have this error.
we use cuda 11.8 and torch 2.1.2, good luck
we use cuda 11.8 and torch 2.1.2, good luck
What is the cudnn version on your platform? Just call print(torch.backends.cudnn.version())
for the output.
cudnn 8.9.7
cudnn 8.9.7
I noticed through your installation guidelines in your README file, it will automatically update the torch and other dependencies to the newest version and cause version mismatch. So I think you probably need to fix the version in your environment settings.
Thanks for providing your settings. I commenced the training successfully.
Firstly very appreciate your work! When I try to use the framework to reproduce your work, I noticed the layernorm kernel is not working on my side. Here is the log:
Please advise possible solutions. Thanks!