-
Hi, I was testing fused attention fp8 in the tutorial on L20 GPU,the test code is as following:
```python
@pytest.mark.parametrize("Z, H, N_CTX, HEAD_DIM", [(1, 2, 1024, 64)])
@pytest.mark.parame…
-
On latest Flux with CUDA.jl v4.0 we have the following regression where gradients are wrong for model on gpu containing BatchNorm layers:
```julia
using Flux, FiniteDifferences, Test
d, n = 3, 2
…
-
And make `approx_in` consistent (see #123).
-
flash attention benchmark fails with [changes](https://github.com/intel/intel-xpu-backend-for-triton/pull/1905) to use upstream pytorch.
It should be a torch issue.
```
Traceback (most recent c…
-
-
It's a little strange and confusing that `@test f() kwarg=value` means two very different things:
* if `kwarg` is `skip` or `broken`, then those are flags to `@test` itself
* otherwise, the call i…
-
**Description**
Add a floating-point matcher similar to [numpy](https://numpy.org/doc/stable/reference/generated/numpy.isclose.html)'s and [pytorch](https://pytorch.org/docs/stable/generated/torch.is…
danra updated
1 month ago
-
Context: updating `jax` in nixpkgs: https://github.com/NixOS/nixpkgs/pull/291705#issuecomment-2095894365
One of the `jaxopt` tests fail when ran with the latest `jax` (0.4.26):
```
============…
-
### Describe the issue
I am creating onnxruntime session using tensorRT. While evaluating the model's output, The tolerance levels (atol and rtol) both could be passed at most at the value of 1e-3, w…
-
Watcher debug status causes non-deterministic PCC to be much more frequent. See bottom of thread for latest findings.
branch: `cglagovich/sdpa_nd`
This has the DEBUG_STATUS statements commented ou…