Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.2k stars 80 forks source link

HF Phi3-mini-128k returns very different gradients than reference #1441

Closed riccardofelluga closed 1 week ago

riccardofelluga commented 1 week ago

🐛 Bug

When testing HF phi3 the gradients differ by several order of magnitudes from reference:

>       torch.testing.assert_close(grads_ref, grads_compiled, rtol=1e-2, atol=1e-2)
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 39 / 128 (30.5%)
E       Greatest absolute difference: 0.0810546875 at index (0, 12) (up to 0.01 allowed)
E       Greatest relative difference: 8.125 at index (1, 3) (up to 0.01 allowed)
E
E       The failure occurred for item [0]

thunder/tests/test_networks.py:508: AssertionError

To Reproduce

Steps to reproduce the behavior:

  1. Go to test-hf-phi3 branch
  2. Run pytest thunder/tests/test_networks.py -k phi3
  3. Wait for the error

Environment

Container 20241114 with Thunder at test-phi3@4c71eaa4f15028f94910e365ce6c3894769578a5

Additional context

This is part of #1278.

cc @apaz-cli

riccardofelluga commented 1 week ago

This seems to have been addressed in transformers 4.46.2, which is bumped in #1439

riccardofelluga commented 1 week ago

Closing since #1439 has been merged and the test passes now.