-
It might be quite interesting to explore SIMD vectorization for elliptic curves and MSMs. This might significantly speed-up:
- Verkle Trees
- KZG
- MSM
without needing a GPU. Ideally the same op…
-
A simple loop multiplying two arrays, with different multiplicity fails to vectorize efficiently on clang 14+, while it worked with clang 13.0.1
The loop is the following, where 4 consecutive values …
-
Key computatiomal kernels do not try and take advantage of Kokkos vectorization primitives, which could potentially limit speedups on a wide range of systems. Refining operators and the loops that use…
-
Now that we have `SecretnessAnalysis`, we could update the `--heir-simd-vectorizer` to only apply its transformations to tensors that are indeed `secret`.
It might be a good idea to make the implem…
-
### Describe the project you are working on
Make the engine faster.
### Describe the problem or limitation you are having in your project
In modern CPUs, most of the compute resources are found in …
-
[RapidJSON does quite a lot of this.](https://github.com/miloyip/rapidjson/blob/369de87e5d7da05786731a712f25ab9b46c4b0ce/include/rapidjson/reader.h#L936-L942)
-
https://github.com/antonok-edm/ampli-Fe/blob/a4af5ad7fbfdee045d94008e90f73dcdf92372db/src/dsp/mod.rs#L45
While reading though the dsp code, I was wondering if the chunking for auto-vectorization is…
-
As a library author, I might want to get faster code without doing (much) extra work. Autovectorization can sometimes help: compilers can automatically convert code to use SIMD instructions. (Note thi…
-
test: https://gcc.godbolt.org/z/f86hxd8cT
```
#define N 480
unsigned int
f (unsigned int res, signed char *restrict a,
unsigned char *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i…
vfdff updated
9 months ago
-
### What is the problem this feature will solve?
As I am looking into ways to improve astropy performance, I can see that using the new'ish compiler feature of function multi-versioning should improv…