-
Hi! This is more of a little excursion than a 'true' issue, but it's about a technique which I've found useful and would like to share. The occasion is that I'm extending my library [zimt](https://git…
-
[P1928R15](https://wiki.edg.com/pub/Wg21wroclaw2024/StrawPolls/P1928R15.pdf) (std::simd — merge data-parallel types from the Parallelism TS 2
-
On main, with simd turned on.
```console
➜ powdr-template git:(main) ✗ rustc -vV | sed -n 's|host: ||p'
aarch64-apple-darwin
```
```console
➜ powdr-template git:(main) ✗ cargo check
C…
-
Hello,
**System information**
- I have written custom code
- OS: Windows 11 23H2
- TensorFlow.js installed from script link https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest/dist/tf.js
- Ten…
-
I tried this code:
```rust
#![feature(stdarch_loongarch)]
use std::arch::loongarch64::*;
#[cfg(target_feature = "lasx")]
pub unsafe fn simd(s: i32) -> i32 {
lasx_xvpickve2gr_w::(lasx_xvreplgr2vr…
-
The absence of whole register load/store instructions was already [discussed in the past](https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/203) with the following conclusion:
> The conclus…
-
As of today, the SIMD "baseline" that we compile for goes up to SSE3, and any higher features are opt-in and runtime dispatched. SSE3 has been the maximum assumed feature for quite a while. We haven't…
-
I'm running the partial assembled code on a platform with `avx512` architecture. However the speed is same as on my vmware.
I did following things:
1. build mfem with `cmake -DCMAKE_C_FLAGS="-march=…
-
This code: ([Godbolt link](https://zig.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYgAzKVpMGoAF55gpJfWQE8Ayo3QBhVLQCuLBhICspJwAyeAy…
-
There are operations like: https://github.com/cppalliance/decimal/blob/develop/include/boost/decimal/detail/wide-integer/uintwide_t.hpp#L888 which are ripe for packing into AVX2 / ARM NEON instruction…