-
Increasingly we are taking changes that use hardware intrinsics to accelerate parts of CoreFX. Without special care, our testing will only ever cover the AVX2 (or AVX) path, not the software path and …
-
Now that SIMD intrinsics for x86 have been stabilized, it might be worthwhile to add explicit SIMD to accelerate unmasking. For example, autobahn-python [uses](https://github.com/crossbario/autobahn-p…
-
Having some form of acceleration would benefit everyone, which this module currently lacks.
Options:
1. core.simd -- Supported everywhere, I think.
2. intel-intrinsics DUB package -- Somewhat sup…
dd86k updated
10 months ago
-
(I don't plan on doing this myself, but I wanted to start the conversation to see who is interested in doing this)
# What
Use RISC-V vector intrinsics to provide optimized implementations of the…
mr-c updated
7 months ago
-
# Context
Pairwise distance computation is an essential part of many estimators in scikit-learn, and can take up a significant portion of run time in certain workflows. I believe that we may achieve …
-
# **Issue №3586 opened by *[Starbuck5](https://github.com/Starbuck5)* at 2022-11-24 06:19:47**
This is the only handwritten assembly in Pygame, and it is stuck in the past. Modern assembly we …
-
Add one here every time you wish for one:
- [ ] `_mm_cvtpd_epi64` that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 => actually a AVX512DQ + AVX512VL existing i…
p0nce updated
10 months ago
-
This is an issue to start talking about SSE support.
I'd like to see a new raw type that is:
1. 128 bits for a float[4] that can be passed to the SSE intrinsics.
2. has the proper alignment
Perhaps …
-
| | |
| --- | --- |
| Bugzilla Link | [14268](https://llvm.org/bz14268) |
| Version | trunk |
| OS | All |
| Attachments | [Simple test case](https://user-images.githubusercontent.com/60944935/14374…
-
Writing SIMD code can be complex and the "optimal" pattern can vary per platform/architecture. This can get more complicated when newer ISAs are also available for use.
As such, we should provide a…