flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.38k stars 127 forks source link

bug: partial unit tests failed #479

Open zhyncs opened 2 months ago

zhyncs commented 2 months ago

latest main, A100

for ele in $(ls); do python3 -m pytest ${ele}; done
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0
rootdir: /flashinfer/python
plugins: anyio-4.4.0
collected 270 items

test_alibi.py ..............F..............F.........................................................................................................................sss..ss...s... [ 61%]
......sss..ss...s.........sss..ss...s.........sss..ss...s.........sss..ss...s.........sss..ss...s........                                                                           [100%]

======================================================================================== FAILURES =========================================================================================
_________________________________________________________________________ test_single_decode_alibi[128-32-33001] __________________________________________________________________________

seq_len = 33001, num_heads = 32, head_dim = 128

    @pytest.mark.parametrize("seq_len", [1, 9, 81, 729, 33001])
    @pytest.mark.parametrize("num_heads", [4, 8, 32])
    @pytest.mark.parametrize("head_dim", [128, 256])
    def test_single_decode_alibi(
        seq_len,
        num_heads,
        head_dim,
    ):
        q = torch.randn(num_heads, head_dim).to(0).half()
        k = torch.randn(seq_len, num_heads, head_dim).to(0).half()
        v = torch.randn(seq_len, num_heads, head_dim).to(0).half()

        o = flashinfer.single_decode_with_kv_cache(q, k, v, pos_encoding_mode="ALIBI")
        mask = torch.ones(1, seq_len, dtype=torch.bool).to(0)
        o_ref = alibi_attention(q.unsqueeze(0), k, v, mask).squeeze(0)
>       torch.testing.assert_close(
            o.cpu().numpy(), o_ref.cpu().numpy(), rtol=1e-3, atol=1e-3
        )
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 7 / 4096 (0.2%)
E       Greatest absolute difference: 0.00244140625 at index (2, 1) (up to 0.001 allowed)
E       Greatest relative difference: 3.302734375 at index (2, 19) (up to 0.001 allowed)

test_alibi.py:39: AssertionError
_________________________________________________________________________ test_single_decode_alibi[256-32-33001] __________________________________________________________________________

seq_len = 33001, num_heads = 32, head_dim = 256

    @pytest.mark.parametrize("seq_len", [1, 9, 81, 729, 33001])
    @pytest.mark.parametrize("num_heads", [4, 8, 32])
    @pytest.mark.parametrize("head_dim", [128, 256])
    def test_single_decode_alibi(
        seq_len,
        num_heads,
        head_dim,
    ):
        q = torch.randn(num_heads, head_dim).to(0).half()
        k = torch.randn(seq_len, num_heads, head_dim).to(0).half()
        v = torch.randn(seq_len, num_heads, head_dim).to(0).half()

        o = flashinfer.single_decode_with_kv_cache(q, k, v, pos_encoding_mode="ALIBI")
        mask = torch.ones(1, seq_len, dtype=torch.bool).to(0)
        o_ref = alibi_attention(q.unsqueeze(0), k, v, mask).squeeze(0)
>       torch.testing.assert_close(
            o.cpu().numpy(), o_ref.cpu().numpy(), rtol=1e-3, atol=1e-3
        )
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 3 / 8192 (0.0%)
E       Greatest absolute difference: 0.0018310546875 at index (0, 227) (up to 0.001 allowed)
E       Greatest relative difference: 0.01119232177734375 at index (0, 227) (up to 0.001 allowed)

test_alibi.py:39: AssertionError
================================================================================= short test summary info =================================================================================
FAILED test_alibi.py::test_single_decode_alibi[128-32-33001] - AssertionError: Tensor-likes are not close!
FAILED test_alibi.py::test_single_decode_alibi[256-32-33001] - AssertionError: Tensor-likes are not close!
================================================================== 2 failed, 232 passed, 36 skipped in 69.47s (0:01:09) ===================================================================
yzh119 commented 2 months ago

It's because in flashinfer we currently use -5e4 as a surrogate of -inf, and when sequence length is large the alibi bias might be smaller than -5e4. The main reason of choosing -5e4 is that -inf cannot do some operations (and will result in nan) and we want this value is within the valid data range of the data type of m (it's fp32 in almost all cases but we provide an option of using fp16 when allow_fp16_qk_reduction=True).

zhyncs commented 2 months ago

OK