-
A0: 00 00 40 40 00 00 00 00 00 00 00 00 00 00 00 00
B0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00
auto mul_A0 = _mm_mul_ss(A0,A0);
auto mul_B0 = _mm_mul_ss(B0,B0);
…
-
As mentioned on [Intel optimization manual](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf) in "Mixing AVX code with SSE code" sect…
-
The AVX and AVX-512 form of vcvtps2ph that takes ymm and zmm sources is specified by Intel like this:
```
VEX.128.66.0F3A.W0 1D /r ib
VCVTPS2PH xmm1/m64, xmm2,
imm8
```
```
VEX.256.66.0F3A.…
-
Hey guys, first things first - great work!
Second - My Gazebo VNC screen crashes down shortly after the start. It seems like the car starts at the corner of the track (off the street), drives for 1 s…
-
Running entirely on CPU
2017-11-16 23:00:58.418427: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE…
-
This issue is a placeholder for future discussion about supporting 4-dimensional-reducing dot-product instructions taking 8bit inputs and accumulating into 32bit, i.e.
```
int32_accumulator += int…
-
An error ocurred while starting the kernel
Your CPU supports instructions that this binary was not compiled to use: AVX AVX2
For maximum performance, you can install NMSLIB from sources
pip inst…
-
There is currently a problem regarding the way DynamoRIO manages AVX and SSE floating point instructions, explained in this post : (https://groups.google.com/forum/#!topic/DynamoRIO-Users/EwjJLo-fBdo)…
-
I'm looking at doing a third implementation of sha256 for x86 targeting the x86-64-v3 ISA level (AVX, AVX2, but no AVX512 and no SHA-NI, i.e. Haswell), because the pure-rust soft implementation isn't …
-
### Before submitting your bug report
- [x] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue](ht…