-
`sdpa_ex` implementation of `torch.nn.functional.scaled_dot_product_attention` returns all output tensor proxy in trace to be on `cuda` but at runtime some outputs are on `cpu`.
Repro
```python
i…
-
## Description
Runtime won't load a converted model for bge-m3
### Expected Behavior
No errors.
### Error Message
Exception in thread "main" java.lang.RuntimeException: data did not match any…
-
### 🐛 Describe the bug
It is working on pytorch/torchvision 2.5 for cu118/cu121/cu124
It only fails on pytorch/torchvision 2.6.0.dev for cu118 and cu124, success for cu121
Here is the error messa…
-
### 🐛 Describe the bug
Hi, I run my model on android mobile platform(Use c++ api with link libpytorch_jni_lite.so), the model load is success, but when i run forward occur this error:
torch jit f…
-
### Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://github.com/prefix-dev/pixi/releases) of pixi, us…
-
Hi @TaoZhong11 ,
Thanks for this amazing work!
I encountered an error while running the Docker image with data I obtained from https://fcon_1000.projects.nitrc.org/indi/PRIMEdownloads.html. To t…
-
### 🐛 Describe the bug
Async NCCL comminucations from `torch.distributed` should run in parallel with CUDA computing kernels, but traces from `torch.profiler` shows it is not true for the first run. …
-
### 🐛 Describe the bug
Running `F.linear` in MPS produces non-negligible (stddev > 1) error when input size is large enough. (see the following code snippet to reproduce)
However, when we use `t…
-
aarch64 PT2 dashboard perf collection consistently hits timeout, e.g. https://github.com/pytorch/pytorch/actions/runs/10175592559 . After logging into a `linux.arm64.m7g.metal` instance, running `pyth…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cpu
Is debug build: False
CUDA used to build PyTor…