Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.2k
stars
80
forks
source link
HF Phi3-mini-128k returns very different gradients than reference #1441
When testing HF phi3 the gradients differ by several order of magnitudes from reference:
> torch.testing.assert_close(grads_ref, grads_compiled, rtol=1e-2, atol=1e-2)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 39 / 128 (30.5%)
E Greatest absolute difference: 0.0810546875 at index (0, 12) (up to 0.01 allowed)
E Greatest relative difference: 8.125 at index (1, 3) (up to 0.01 allowed)
E
E The failure occurred for item [0]
thunder/tests/test_networks.py:508: AssertionError
To Reproduce
Steps to reproduce the behavior:
Go to test-hf-phi3 branch
Run pytest thunder/tests/test_networks.py -k phi3
Wait for the error
Environment
Container 20241114 with Thunder at test-phi3@4c71eaa4f15028f94910e365ce6c3894769578a5
🐛 Bug
When testing HF phi3 the gradients differ by several order of magnitudes from reference:
To Reproduce
Steps to reproduce the behavior:
test-hf-phi3
branchpytest thunder/tests/test_networks.py -k phi3
Environment
Container 20241114 with Thunder at test-phi3@4c71eaa4f15028f94910e365ce6c3894769578a5
Additional context
This is part of #1278.
cc @apaz-cli