-
| | |
|--------------------|----|
| Bugzilla Link | [PR40815](https://bugs.llvm.org/show_bug.cgi?id=40815) |
| Status | CONFIRMED |
| Importance | P enha…
-
| | |
|--------------------|----|
| Bugzilla Link | [PR34682](https://bugs.llvm.org/show_bug.cgi?id=34682) |
| Status | NEW |
| Importance | P enhancemen…
-
Here is a example: https://godbolt.org/z/nd9Kc3cqq
-
For example: `__m128i _mm_dpbusd_avx_epi32 (__m128i src, __m128i a, __m128i b)`
This takes 1 x "src" and 2 x "a * b" multiplication inputs but the clang/llvm intrinsics are defined as:
```
TA…
-
This is a spinoff of vectorisation issue #71 and a followup to the big PR #171.
---
(The first part of this description also serves as documentation of what is available there now!).
The curr…
-
### Describe the issue:
I'm compiling numpy 1.24.2 with python 3.10.8 and gcc 12.2.0 on our old opteron based hpc. The compilation (through easybuild) seems to complete without errors. When running t…
-
### Your current environment
```text
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC ve…
-
I compared an older version from Nov 23 with Apr 24, and the older version is much faster.
total time = 6225.76 ms
vs
total time = 3817.54 ms
Same CPU, same compiler and settings, same test: …
-
We observed some specific problems when going from CPUSummary.jl v0.1.8 to v0.1.14 at [Trixi.jl](https://github.com/trixi-framework/Trixi.jl). Everything is fine with the old version of CPUSummary.jl.…
-
- [x] a basic implementation that is obviously correct to test against
- [x] an efficient rust version
- [x] move the `std::arch::x86_64::_mm_crc32_u32` intrinsic into the crc32 module, make it plat…