Open Orca-bit opened 1 month ago
Hi @Orca-bit , is this bug reproducible every time? If so, I will try to reproduce it and then provide you with an answer. Additionally, I will also test the issue mentioned at https://github.com/NVIDIA-Merlin/HugeCTR/issues/463.
@kanghui0204 yes, it is reproducible. By the way, could you share the md5sums of sok split datasets, I have checked md5sums of the hugectr datasets, i.e. train.bin ,test.bin and val.bin.
Describe the bug
after runing iteration 3790, some errors occur, it looks like something wrong with dataset.
To Reproduce Steps to reproduce the behavior:
docker pull & docker run
commandsExpected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Additional context Add any other context about the problem here.