ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
2.22k stars 76 forks source link

Feature: Heuristics for choosing a SIMD variant over the corresponding scalar non-SIMD version #184

Open nordlow opened 3 weeks ago

nordlow commented 3 weeks ago

Describe what you are looking for

For a given function f, are there any (typically architecture-specific) heuristics for when a SIMD version opposite to a serial version of an algorithm should be chosen?

For instance, for

sz_find(sz_cptr_t haystack, sz_size_t h_length, sz_cptr_t needle, sz_size_t n_length) {

one such heuristics could be

bool useSIMD(size_t h_length, size_t n_length) {
    if (h_length > 1024) {
        return true;
    } else if (h_length / n_length >= 4) {
        return true;
    }
}

.

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 3 weeks ago

There are definitely spots where platform-specific performance tuning can be a great idea. This can be a topic for a proper research project, potentially exposing threshold variables with extern and tuning our sz_find, sz_copy, and other kernels for the target platform. It get's a bit trickier, if the same kernel depends on multiple variables and we are forced to grid-search...

Are you familiar with similar projects, @nordlow?

nordlow commented 3 weeks ago

Unfortunately not.

Thanks for the response anyway.