-
### š Describe the bug
Tensor dim 0 is already sharded on mesh dim 1, DTensor operator implementation does not support things like hybrid sharding strategies yet (i.e. [Shard(0), Shard(0)])
this iā¦
-
### Describe the issue
Hi!
I've been building ORT using the command and noticed binary operators like _Add_ are being executed by the Eigen library, I did some debugging and noticed Eigen is using tā¦
-
### Background and motivation
There are approved and soon to be added [AVX512-VBMI2 Compress & Expand intrinsics](https://github.com/dotnet/runtime/issues/87097) as part of new vector mask proposalā¦
-
I am using the following commands to build par2cmdline-turbo in a rocky linu8 container.
```shell
git clone https://github.com/animetosho/par2cmdline-turbo.git
cd par2cmdline-turbo
aclocal
automaā¦
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
tf 2.17
### Custom code
Yes
### OS platform and dā¦
x0w3n updated
8 hours ago
-
It seems that Intel also implements curve25519 based on AVX512-IFMA. Have you compared the performance of the two implementations?
https://github.com/intel/cryptography-primitives/tree/5ada2314016bā¦
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
tf 2.17
### Custom code
Yes
### OS platform and distribution
Linux Uā¦
x0w3n updated
8 hours ago
-
Hello.
I see that for Intel AVX2 code you keep the same binary from Haswell (2013) architecture and the same goes for Intel AVX512 using SkylakePurley (2017) executable and for AMD AVX2/AVX512 you haā¦
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
tf 2.17
### Custom code
Yes
### OS platform and dā¦
-
We already expand basic truncation intrinsics to the trunc+shuffle sequence:
```cpp
static __inline__ __m128i __DEFAULT_FN_ATTRS128
_mm_cvtepi64_epi8 (__m128i __A)
{
return (__m128i)__builtin_sā¦