Dawoodoz / DFPSR

Fast realtime softare rendering library for C++14 using SSE/AVX/NEON. 2D, 3D and isometric rendering with minimal system dependencies.
https://dawoodoz.com/dfpsr.html
78 stars 6 forks source link

Slow 2X pixel upscaling on i7 13700F #101

Closed Dawoodoz closed 2 months ago

Dawoodoz commented 3 months ago

Tried software rendering on my new Intel Core i7 13700F processor, which almost doubled the overall frame-rate compared to the previous i5 9600K. However, the 2X upscaling is suddenly slowing down the framerate to 10% of what it should be, despite full speed at all the other pixel scales, such a 3X (odd dimensions).

The 2X pixel upscaling previously used a special SIMD instruction for duplicating pixels, which was removed when getting rid of the simd_extra.h header, because no performance difference could be seen on the i5 9600K. Maybe I have to implement zip and unzip instructions with scalar fallback implementations in simd.h, so that one can optimize this again without going around the SIMD hardware abstraction layer or relying on the compiler to optimize it. It is important that nothing goes around the abstractions for SIMD vectorization, because otherwise someone porting it to new SIMD extensions would have to test lots of different algorithms instead of just passing the regression tests.

It should not be possible for 2X upscaling to be much slower than 3X upscaling written using the same design pattern, so it might be a bug in a new version of the g++ compiler because of the fresh install. Such a bug would be visible if displaying the generated assembler code. In the worst case, one will have to implement automatic performance tuning by running a benchmark with different implementations of image filters at different resolutions when starting a game, but it is much cleaner if one implementation is fast overall on most processors without any glitches in the optimization.

Dawoodoz commented 3 months ago

The same strange performance happened on both Linux and MS-Windows.

Dawoodoz commented 2 months ago

Using "g++ -S -fverbose-asm -g -O2 draw.cpp" from the Source/DFPSR/Image folder to generate the draw.s assembler file, which is equivalent to draw.cpp but is hardcoded based on all included headers. The -S flag is the important argument that generates the assembler file.

Dawoodoz commented 2 months ago

Inspecting the assembler code revealed that no code had been generated for the 2x2 upscaling, because the old "#ifdef USE_SIMD_EXTRA" still remained around the 2x2 upscaling. So no SIMD extensions are needed for 2x2 upscaling after all, which makes sense when no calculations are involved.

Dawoodoz commented 2 months ago

Fixed