BigUglySpider / EmuLibs

Selection of libraries designed to be used with Emu projects. This was originally a Math library only, but has since been changed to hold all Emu libraries to enable consistency in changes to dependencies (such as EmuCore modifications).
https://biguglyspider.github.io/math
0 stars 0 forks source link

(Scalar) Quaternion from Euler w/Radian input performance is odd #64

Closed BigUglySpider closed 2 years ago

BigUglySpider commented 2 years ago

In speed tests as of commit f473c9f0854951092364b32640642d09176461b3, an unusual property has been found with the scalar Quaternion performance in release builds:

This is especially unusual given two things:

  1. All values used in calculation are made local to the function via static_casts to local members.
    • Of course, the compiler may optimise these copies away if it feels like doing that, so cache locality (or more so, a lack thereof) based on this is a likely culprit.
  2. When inputting radians, there is less that the function has to do, since we work in radians.
BigUglySpider commented 2 years ago

Update

This is most likely an issue with cache locality as suggested within initial post.

When converting to radians outside of the function and thus using radian input, its speed once again matches as if we were just passing the data directly and indicating it is in degrees.

This shows that the problem is not necessarily with Quaternion, but instead likely with the test due to the placement of values in the cache whose copies are being omitted by the release compiler.

There's not much that can really be done about this for the time being due to priorities. However, this should be kept in mind as this is not a final answer; reviewing disassembly of the release Quaternion from euler conversion function should be considered at some point.

Nonetheless, this issue is being closed for now.

BigUglySpider commented 1 year ago

Notably, this issue does not occur on the new testing hardware (AMD Ryzen 9 5950X) - May be an Intel thing, may be an i5-specific thing, may be an i5-8400-specific thing, not too sure)

Unlikely to be an issue of a smaller cache previously since the bandwidth in this operation is nowhere near large enough for that to be a likelihood