avx512 Search Results - Githubissues

dotnet/runtime #108109

[API Proposal]: AVX512-F/AVX512-VBMI2 Intrinsics for Compres…

### Background and motivation It seems like that https://github.com/dotnet/runtime/issues/87097 lacks intrinsics for `compress` instructions with merge-masking. Merge-masking for `vpcompress*` and…

MineCake147E updated 1 week ago

MihuBot/runtime-utils #777

[JitDiff X64] [MihaZupan] Further improve ProbabilisticMap o…

[Job](https://mihubot.xyz/runtime-utils/EhvOq3RA) completed in 17 minutes 15 seconds. ### Diffs ``` Found 262 files with textual diffs. Summary of Code Size diffs: (Lower is better) Total bytes o…

MihuBot updated 1 week ago

MihuBot/runtime-utils #776

[JitDiff X64] [MihaZupan] Further improve ProbabilisticMap o…

[Job](https://mihubot.xyz/runtime-utils/EhvJv0tA) completed in 20 minutes 19 seconds. ### Diffs ``` Found 261 files with textual diffs. Summary of Code Size diffs: (Lower is better) Total bytes o…

MihuBot updated 1 week ago

rust-lang/rust #132909

Missed AVX512 opportunity when dealing with arrays of 64 byt…

This is probably a `LLVM` behavior that is affecting `rustc`. The following snippet explicitly deals with arrays of 64 bytes and was extracted from a WebSocket procedure that unmasks frames received …

c410-f3r updated 1 week ago

intel/isa-l #291

AVX512 detection failed when cpu supports AVX512

OS: debian 9 GCC: 6.3 NASM: 2.12.01 CPU: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz ISA-L: 2.31 I have confirmed that my cpu supports AVX512 through https://ark.intel.com/content/www/us/en/ark/…

swhzzh updated 4 months ago

google/gvisor #10991

Report AVX512_BF16 support in CPUID features

### Description We've been doing some performance analysis and have noticed that on bare-metal, a PyTorch image conversion from RGB to YUV will take over 1s for a sample image and on bare-metal it ta…

jseba updated 1 month ago

spiraldb/fsst #48

Faster short string compression with terminator byte and AVX…

Hi SpiralDB, I've had a great experience using the fsst lib and vortex -- thank you for building them. I'm trying to make fsst even faster. Currently the fsst compress the vortex varbin array by [i…

XiangpengHao updated 3 weeks ago

iqtree/iqtree2 #45

AVX512 on skylake

Same as for iqtree v1 https://github.com/Cibiv/IQ-TREE/issues/216

Phhere updated 1 month ago

awslabs/aws-checksums #89

AVX512 branch: AVX512 optimized CRC32 implementation

### Describe the feature Currently CRC32 implementation is not optimized for SSE4.2 or AVX512. ### Use Case Provide better performance for CRC32 using HW optimized implementation ### Proposed S…

pbadari updated 3 months ago

ashvardanian/StringZilla #195

Feature: Low overhead case insensitive find

### Describe what you are looking for The `sz_tolower` function requires copying to another buffer. Within `find` and `find_byte` routines a lowercasing step can be done quickly in a few extra cycles…

e253 updated 5 days ago

1000+ results for avx512

1000+ results
for avx512