Open zhyncs opened 2 months ago
It's because in flashinfer we currently use -5e4
as a surrogate of -inf
, and when sequence length is large the alibi bias might be smaller than -5e4
. The main reason of choosing -5e4
is that -inf cannot do some operations (and will result in nan
) and we want this value is within the valid data range of the data type of m
(it's fp32 in almost all cases but we provide an option of using fp16 when allow_fp16_qk_reduction=True
).
OK
latest main, A100