BurntSushi / memchr

Optimized string search routines for Rust.
The Unlicense
799 stars 97 forks source link

no-std + cpu feature detection? #122

Closed VorpalBlade closed 1 year ago

VorpalBlade commented 1 year ago

It doesn't look like the runtime dispatch code actually needs anything from std apart from is_x86_feature_detected itself (every other reference to things in std also exists in core), unless I'm missing something (I'm relatively new to rust, so that could absolutely be the case). Could this not be replaced with something like the no-std cpufeatures crate (at least for no-std environments)?

This has the advantage of not being x86-specific, admitting easier extensions in the future. While it would not yet allow solving #76 (detecting neon is not yet supported), it would be a step in that direction.

BurntSushi commented 1 year ago

There is definitely no way I'm depending on another crate for what us critical and core functionality in this crate. So the only way this happens is if CPU feature detection is added to core and/or alloc, or if the feature detection is added to this crate itself. Neither seem particularly likely to me.

Could you please elaborate on your use case?

BurntSushi commented 1 year ago

See also #106. And #120.

VorpalBlade commented 1 year ago

I'm working on a no-std library that can make use of alloc still. The library will be used in an upcoming personal hobby project in both a no-std environment (embedded ARM32 with alloc) as well as std environments (Linux user space on x86-64 as well as ARM32 and AArch64). I have a need for a fast memchr2 across all of those. (For embedded I would obviously not use runtime detection though, but select it at compile time, as I know exactly what I'm targeting.)

The reason I wanted no-std CPU feature detection was primarily to simplify dependency management, avoiding unneeded feature matrix combinations. It would also leave the possibility open for using my library in more situations down the line (though on x86 dealing with FPU/SIMD state in no-std environments can be a bit interesting).

Unfortunately it seems the situation with this library is rather limited on non-x86 anyway, and even the crate I suggested does not support ARM32 feature detection (as I discovered after posting this issue). I guess this is yet another crate I will have to roll my own of.

BurntSushi commented 1 year ago

Unfortunately it seems the situation with this library is rather limited on non-x86 anyway

That's why I asked. I figured you might be asking for something that probably doesn't solve your actual problem all on its own. But yes, this is mentioned in the README. The README only discusses x86.

I guess this is yet another crate I will have to roll my own of.

There are SWAR oriented routines of memchr2 in this crate that are platform independent. Those aren't going to be as fast as vector algorithms, but they're going to be faster than your naive haystack.iter().position(|&b| b == b'a' || b == b'z').

But I mean, yeah... memchr exists for exactly the same reason. I wrote it because it was yet another crate I had to roll on my own. It happens.

There are also open questions about how to adapt some of the algorithms in this crate (notably the movemask aspects), that IIRC, aren't obviously available on ARM. But I am way out of my depth on ARM. I don't have any ARM hardware, and Rust's standard library has principally focused on the x86 platform for these sorts of things.

Speaking personally, once I get the time, my own priority is probably going to be getting some new mac hardware and getting this crate working there to the extent possible. IIRC, the main blocking concern was that the stuff needed was either not stable yet or too recently stable.

I do expect the situation to improve in the longer term, but probably not in the short term. I had been holding out hope for the portable higher level SIMD work, aka std::simd, but it hasn't landed yet.

I'll leave this open for now.

BurntSushi commented 1 year ago

I don't have any plans to add this in the near future. It's a big maintenance ask IMO. I'm not 100% opposed, but I'd need a very compelling reason to do it.