-
# Gas Optimization Report
## Note On Methodology
The gas snapshot file didn't seem to be up to date with the original tests so `forge snapshot` was run before any optimization was tested to ensure the…
-
ODataQueryOptions Apply methods accepts IQueryable and returns IQueryable, however when PageSize is specified LimitResult is called that uses TruncatedCollection that is List that forces materializati…
-
Examples: https://godbolt.org/z/xG3dbYsfd
"slide_bit" should be able to simply do `lsr ... rol ...`
"shift_bit_left" should be able to simply do `asl ... ror ...`
It appears that the optimiza…
-
Current version significantly underperforms the standard library version. We need to understand the differences and make up the gap.
```
$ go test -run NONE -benchtime 10s -bench ScalarMult -cpupr…
-
Now that we have been moving to a lower-level IR, it's starting to be feasible to experiment with non-C back ends.
I'm planning to experiment with generating x86-64 assembly directly. The goal is t…
-
- [ ] Revisit concept of `set_time` to a concept of a FieldTimeView, allowing to access a field in different time steps, during a single assembly, that is necessary for more complex timesteping scheme…
-
With opus 1.5.1, building with MSVC on Windows is failing with `nnet_avx2.c is being compiled without AVX2 enabled`. This is x86_64 architecture computer. The CMake build is working fine.
```
(tar…
-
Interpolation routines (e.g. `hpface`, `dhpface-`, etc.) constitute the bulk of the computation in `update_gdof` and `update_Ddof`, which can be expensive (although they aren't as much of a bottleneck…
-
### Is your feature request related to a problem?
ESP32(-S3) fp32 division is notoriously slow. It can be made faster several times by using a reciprocal asm sequence, which is accurate to 1 ULP - …
-
There are a couple of problems that prevent it from compiling for AArch64, but they pretty much all revolve around the ARM assembly found in `thvector.h` and in `OpenBLAS-stripped/arm`. ARM isn't comp…