Open klmentzer opened 8 months ago
Hi @klmentzer , Thanks for your trial. There is a bug when the regularizer is used together with solver.use_cuda_graph=True. We will fix the bug in the upcoming release. Could you please disable cuda graph as a WAR?
Is there any solution to this. I am getting the same issues, when trying run dlrm training v3.1 benchmarking with DGX H100. I have tried with next version v23.08.00 Nvidia-Merlin/HugeCTR like v23.09.00 and latest one too, but the same error persists. Can you please tell me how do we fix it. @JacoCheung
Hi @Abatpool , have you tried turning cuda_graph off?
Hi @Abatpool , have you tried turning cuda_graph off?
Did turn it into false, and used Nvidia-Merlin/HugeCTR like v24.04.00(verified release) still facing the same error as attached in screenshot below
Describe the bug Enabling regularization causes
CUDNN_STATUS_MAPPING_ERROR
for deepfm example (runs without problem without regularization). Also, using a keyword argumentlambda
to specify the regularization parameter causes a syntax error (though this can be avoided by using**{"lambda": 1e-3}
as an argument).To Reproduce Steps to reproduce the behavior:
use_regularization=True
to thehugectr.Layer_t.BinaryCrossEntropyLoss
layer and run the code to generateCUDNN_STATUS_MAPPING_ERROR
.Expected behavior The model should train with regularization, and the keyword argument does not cause a syntax error.
Screenshots
Environment (please complete the following information):
Thanks for your help!