-
### Background and motivation
It seems like that https://github.com/dotnet/runtime/issues/87097 lacks intrinsics for `compress` instructions with merge-masking.
Merge-masking for `vpcompress*` and…
-
[Job](https://mihubot.xyz/runtime-utils/EhvOq3RA) completed in 17 minutes 15 seconds.
### Diffs
```
Found 262 files with textual diffs.
Summary of Code Size diffs:
(Lower is better)
Total bytes o…
-
[Job](https://mihubot.xyz/runtime-utils/EhvJv0tA) completed in 20 minutes 19 seconds.
### Diffs
```
Found 261 files with textual diffs.
Summary of Code Size diffs:
(Lower is better)
Total bytes o…
-
This is probably a `LLVM` behavior that is affecting `rustc`.
The following snippet explicitly deals with arrays of 64 bytes and was extracted from a WebSocket procedure that unmasks frames received …
-
OS: debian 9
GCC: 6.3
NASM: 2.12.01
CPU: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
ISA-L: 2.31
I have confirmed that my cpu supports AVX512 through https://ark.intel.com/content/www/us/en/ark/…
-
### Description
We've been doing some performance analysis and have noticed that on bare-metal, a PyTorch image conversion from RGB to YUV will take over 1s for a sample image and on bare-metal it ta…
jseba updated
1 month ago
-
Hi SpiralDB, I've had a great experience using the fsst lib and vortex -- thank you for building them.
I'm trying to make fsst even faster. Currently the fsst compress the vortex varbin array by [i…
-
Same as for iqtree v1
https://github.com/Cibiv/IQ-TREE/issues/216
-
### Describe the feature
Currently CRC32 implementation is not optimized for SSE4.2 or AVX512.
### Use Case
Provide better performance for CRC32 using HW optimized implementation
### Proposed S…
-
### Describe what you are looking for
The `sz_tolower` function requires copying to another buffer. Within `find` and `find_byte` routines a lowercasing step can be done quickly in a few extra cycles…