BurntSushi / memchr

Optimized string search routines for Rust.
The Unlicense
799 stars 97 forks source link

simd: support AVX2 detection without std #106

Closed andylizi closed 1 year ago

andylizi commented 2 years ago

Summary

This PR implements AVX2 runtime feature detection that works in no_std using the cpuid instruction.

Motivation

memchr provides low-level primitives, and it's unfortunate that we cannot make use of the most performant implementation without bringing in libstd. And there's no good reason for it, either — unlike other architectures, detecting CPU features on x86 isn't OS dependent (see this comment).

The proper solution would be to wait until is_x86_feature_detected is available in libcore, but that probably won't happen any time soon. In the meantime, I'd like to propose doing it ourselves as a stopgap measure.

Drawbacks

Rationale and alternatives

Prior art

BurntSushi commented 2 years ago

Thanks for the PR and the very thorough analysis! I'm definitely a bit hesitant to commit to something like this, in part because it's hard to know whether the code written for the CPUID check is correct or not without spending the time to go and understand everything myself. (Which I don't have time for right now.)

The proper solution would be to wait until is_x86_feature_detected is available in libcore, but that probably won't happen any time soon. In the meantime, I'd like to propose doing it ourselves as a stopgap measure.

Could you say what this is a stop gap for? In particular, you later say:

  • The real life use cases for the no_std + x86_64 + performance critical combo aren't the most plentiful. I'm not sure how many people would actually have a need for this, besides "it's nice to have". Boils down to cost-benefit analysis.

So who would actually benefit from this? Is there a concrete use case that motivated this PR in the first place? In my view, if this is just "nice to have," then I'm not sure it's worth doing. Users will get the SSE2 implementation at least, which is also quite fast.

andylizi commented 2 years ago

Yeah, this is about the kind of reply I had expected… The question of "Is it worth it?" and "Can you provide an actual use case?" are definitely the two things I was most unsure about when I posted this PR.

The thing about performance is, well, for most cases it doesn't really matter that much. Even without considering no_std, I'd be hard pressed to think of any real world situation where AVX is absolutely necessary (except for benchmarking, of course).

In my case, I was writing a no_std project with "fast" being one of its goals. It spends most of its time in memchr. But how fast is considered fast enough? Compared to the SSE implementation, I remembered I had measured about something like 10%~20% increase in throughput. But do I really need that extra throughput? Probably not. This was more about "the code is already there, why not try to make use of that" than any practical need that motivates it.

In the end, whether something is considered "worth doing" is quite subjective, and it's up to the maintainer to decide what sort of trade-offs they want for their project. I too have reservations about this PR, but can't be sure without at least trying right:joy:. Feel free to close if that's what you decided, I don't think I can make a convincing case of it. It's definitely understandable, I'd be hesitant to mess with something like CPUID just for this, too.


Edit: Oh by the way,

  • Fall back to #[cfg(target_feature)] compile-time detection for no_std.

Can we do this to keep the AVX option available at least? Most people probably still won't benefit from this, but it should be a minimal three-lines change.

BurntSushi commented 2 years ago

Out of curiosity, what is the use case for a no_std crate on x86?

Edit: Oh by the way,

  • Fall back to #[cfg(target_feature)] compile-time detection for no_std.

Can we do this to keep the AVX option available at least? Most people probably still won't benefit from this, but it should be a minimal three-lines change.

Yeah I think I can get on board with this. Otherwise, I'll probably hold off on this PR.

8573 commented 2 years ago

Out of curiosity, what is the use case for a no_std crate on x86?

I may misunderstand (edit: yes), but I can think of a likely case and an unlikely (or rather uncommon) case:

BurntSushi commented 2 years ago

@8573 The former case isn't really applicable here. This library already gives you that. The context that is important here is actually running code without std on x86, as this PR is about making a faster version of memchr available in such scenarios.

BurntSushi commented 1 year ago

I am bringing in #120 so hopefully that's good enough. Otherwise I don't think the use case is compelling enough here to justify maintaining my own CPUID checks. I'm not 100% opposed to it, but I would need a really compelling use case for it because it looks like a maintenance hazard to me.