-
**Problem**
Lack support for VNNI
**Success Criteria**
**Additional context**
-
Currently, the following 2 single-layer MLP have worst performance compared with GC v1.
dtype | batch size | hidden list | GC V1 | 8c55a0544 remove brgemm read lock
…
-
Some Intel Xeon server CPUs (for example _Xeon Platinum 8171M_ or _Xeon Platinum 8272CL_) support VNNI instruction. Is this something which chould be used for better performance or it is not suited fo…
-
Support for Intel Emerald Rapids CPUs would be useful. Here is the `lscpu` output for one such core; I will attempt to create a PR for adding this if I can determine everything needed.
```
process…
-
For example: `__m128i _mm_dpbusd_avx_epi32 (__m128i src, __m128i a, __m128i b)`
This takes 1 x "src" and 2 x "a * b" multiplication inputs but the clang/llvm intrinsics are defined as:
```
TA…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
2.18.0-dev20240925
### Custom code
Yes
### OS platform and distributi…
-
There are two variants:
* AVX512_VNNI (Tiger Lake, Rocket Lake) - 512bit/256bit/128bit
* AVX_VNNI - (upcoming Alder Lake) - 256bit/128bit
VNNI replaces 3 simd instructions with one instruction.
…
-
**Describe the bug**
Hi Dr @clementpoiret! Now that you have graduated :tada: here is a technical issue to keep you busy :wink:
On a workstation with AVX512 and VNNI CPU capabilities, I am gett…
ylep updated
10 months ago
-
### What happened?
I'm running ollama which in turn uses llama.cpp. The server has quad Intel Xeon Sapphire rapids. In the debug line for the "system info" i get:
```shell
INFO [main] system info…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
2.18.0-dev20240925
### Custom code
Yes
### OS platform and distributi…