-
### 🐛 Describe the bug
when I use executorch to lower my transformer-based model to xnnpack backend.I meet the error
```
INFO:executorch.backends.xnnpack.partition.xnnpack_partitioner:…
-
We observed some specific problems when going from CPUSummary.jl v0.1.8 to v0.1.14 at [Trixi.jl](https://github.com/trixi-framework/Trixi.jl). Everything is fine with the old version of CPUSummary.jl.…
-
Changes to avx512 lyra2 code in sponge-2way.c for v3.11.2 produced improvements of
between 6% for x21s and 47% for lyra2z. However, peformance dropped 9% for x22i and
5% for x25x. It's easilly repro…
-
Here is a trace from my Intel Arc A770 via Docker:
```
$ ollama run deepseek-coder-v2
>>> write fizzbuzz
"""""""""""""""""""""""""""""""
```
And here is an trace from Arch linux running on …
-
### Steps to reproduce the issue
```console
$ spack install openmpi@4.1.4 %gcc@7.3.0 +legacylaunchers +gpfs +pmi schedulers=slurm >> log.openmpi 2>&1
```
### Error message
Error message
==> In…
-
### 🐛 Describe the bug
Since PyTorch 2.5.0, there is a massive (more than 10x) performance regression when using `BatchNorm2d` with `torch.compile` set to `reduce-overhead` and `DistributedDataPara…
-
### 🐛 Describe the bug
Minimal reproducer:
```python
import torch
x = torch.ones(1).expand(2)
print(f"{x.is_contiguous()=}")
print(f"{x.to(memory_format=torch.contiguous_format).is_contiguous(…
-
For example: `__m128i _mm_dpbusd_avx_epi32 (__m128i src, __m128i a, __m128i b)`
This takes 1 x "src" and 2 x "a * b" multiplication inputs but the clang/llvm intrinsics are defined as:
```
TA…
-
**Describe the bug**
I m trying to run a setup of Vincent. It reports it hangs at t=0.
But when I try to run it, I experience a hdf5 error:
```
Wed Nov 20 09:20:45, Info: 9 (dr): 0
Wed Nov 20 …
-
### Proposal to improve performance
Test new feature medusa speculative sampling with [vllm v0.5.2](vllm-openai:v0.5.2).
After using Medusa speculative sampling, the performance dropped significantl…