Open fanzhongyi opened 3 weeks ago
Ah yes I change the layernorm kernel few weeks ago I might have to check the test again
Thank you so much reporting Will fix it by today
Hey , I was able to fix it. the problem was that the backward kernel return only the input grads and w and b grad was no longer outputted
you can now run the test and it should work
============================= test session starts ==============================
platform linux -- Python 3.10.14, pytest-8.3.3, pluggy-1.5.0
plugins: time-machine-2.14.1, typeguard-4.3.0, anyio-4.4.0
collected 16 items
test_layernorm.py ................ [100%]
============================== 16 passed in 8.63s ==============================
add Codeadd Markdown
Thank you for your prompt response. I have tested the fix, but unfortunately, I encountered a bug in the test code after the recent commit. The issue occurs in the following following lines, where the assertion always passes, even though it shouldn't. After I fixed this, the test still fails.
Additionally, I believe it would be beneficial to add a test case that explicitly compares the gradients of weight and bias between the Triton-based TritonLayerNorm and PyTorch's torch.nn.LayerNorm. This would ensure that the gradient calculations are consistent across both implementations.
Thanks again for your support.
I see ok I will try to add a test case that explicitly compares the gradients of weight and bias between the Triton-based TritonLayerNorm and PyTorch's torch.nn.LayerNorm.
and I'll check my implementation again
Looking forward to your updates, and thank you very much for your open-source.
I am encountering an issue where the LayerNorm unit tests fail during execution in the ngc docker 24.10 environment. Specifically, the gradient matching between the Triton-based TritonLayerNorm and the PyTorch standard torch.nn.LayerNorm is not passing. It seems that the gradients for the weight and bias parameters in the custom Triton-based LayerNorm implementation are not being calculated properly. TThe assertion error message is:
tests/test_layernorm.py:69: AssertionError ========================================================================== short test summary info =========================================================================== FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[1-128-256] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[8-512-1024] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[16-256-512] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[4-1024-768] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[8-1024-1024] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[16-1024-1024] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[32-512-1024] - AssertionError: LayerNorm weight gradients don't match! ================================================================== 7 failed, 36 passed in 93.55s (0:01:33) ===================================================================