Doy-lee / RaylibSIMD

SIMD Implementation of Raylib's API
zlib License
9 stars 0 forks source link

QUESTION: Did you compare SIMD versions with original implementation? #1

Open raysan5 opened 3 years ago

raysan5 commented 3 years ago

Just out of curiosity, did you profile what is the speed improvement compared to current implementations?

I'm not an expert, but AFAIK most CPUs today support SIMD instructions, does it mean this implementation can work in multiple CPU or is there some constraint?

Doy-lee commented 3 years ago

Hey Raysan5

Just out of curiosity, did you profile what is the speed improvement compared to current implementations?

I haven't with the current master, but at time of posting there was a rough 4x speed up as expected switching to SIMD. I didn't formalize a result of the benchmark anywhere as I was just comparing cycle counts between implementations.

I'm not an expert, but AFAIK most CPUs today support SIMD instructions, does it mean this implementation can work in multiple CPU or is there some constraint?

I am also not a SIMD expert, but that is my understanding. Availability basically depends on the instructions used in the implementation, in this implementation I used x86_64 SSE2/3 SIMD instructions, newer instructions would limit the available CPU's that can run this implementation. A good litmus test here on relative availability is looking at https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam you can see at the bottom in Other Settings 100% of users in surveys are reporting CPU's with support for SSE2/3.

That's a good indication where there's going to be some semblance of correlation of gamers and developers who are likely to use Raylib having support for SSE2/3. Though this is for x86_64, once you go into ARM, you also do have support for SIMD but under the NEON instruction set, a separate implementation using those intrinsics will be needed for SIMD support on ARM. I would like to give that a go for learning, but don't have ARM hardware.

It's possible to choose at runtime which instruction set to use by catching the fault on the instruction to determine the instructions available, meaning you can have SSE2/3/4, AVX all in the same run-time package and choose accordingly.