DeadSix27 / waifu2x-converter-cpp

Improved fork of Waifu2X C++ using OpenCL and OpenCV
MIT License
792 stars 86 forks source link

Weird performance characteristics on PPC #201

Closed koachan closed 5 years ago

koachan commented 5 years ago

So, after a compiler upgrade, I just noticed that the AltiVec implementation has a weird performance characteristics when compiled with different GCC versions. For some reason, if I use GCC 8 to compile the code, the program will be faster by almost two times compared with the version compiled with GCC 7 or 9.

Also, for completeness' sake, I modified the handler to use plain #defines instead of struct/loop wrappers (see the altivec-unwrapped branch). With this implementation, all the compilers I tested gives roughly the same performance.

Below is the result of my testing:

GCC version altivec-unwrapped GFLOPS master ("wrapped") GFLOPS
7.4.0 4.11 4.19
8.3.0 4.36 8.53
9.2.1 4.46 4.43

(All numbers are taken from the GFLOPS given in the final message.)

All the resulting files have the same hash, so I don't think that the compiler broke anything during optimization.

06d0386092bbedc945327c13cf872bfa72b95458ae3e802ce45fa14c0cac4722  gcc7-unwrapped.png
06d0386092bbedc945327c13cf872bfa72b95458ae3e802ce45fa14c0cac4722  gcc7-wrapped.png
06d0386092bbedc945327c13cf872bfa72b95458ae3e802ce45fa14c0cac4722  gcc8-unwrapped.png
06d0386092bbedc945327c13cf872bfa72b95458ae3e802ce45fa14c0cac4722  gcc8-wrapped.png
06d0386092bbedc945327c13cf872bfa72b95458ae3e802ce45fa14c0cac4722  gcc9-unwrapped.png
06d0386092bbedc945327c13cf872bfa72b95458ae3e802ce45fa14c0cac4722  gcc9-wrapped.png

What's happening here? What can I do to make the output of the other GCC versions fast? Any help/direction from someone knowledgeable in C++ and/or GCC internals would be very appreciated.

Note:

Files:

GCC 7:

GCC 8:

GCC 9:

The source image:

koachan commented 5 years ago

So, it looks like the aggessive optimization flags and unroll settings I use causes GCC 9 to generate worse code. Changing it to something less aggressive seems to fix this issue.

See also #202.