Hardware accelerated floating point functions

Kuratius commented 3 months ago

https://github.com/Kuratius/sm64/commit/ace0e206ebab999da123168f7ff2dec482f34351

I tried implementing hardware accelerated fdiv and sqrtf functions, I also tried writing an fmul implementation that should be somewhat faster than the default. There's also a rounding implementation that avoids float adds: https://github.com/Kuratius/sm64/commit/515fc97a55f15f83fc8531c9e7f63c186237860b

Note that the fdiv and sqrtf use the hardware divider and square root unit while doing other stuff for the float handling instead of blocking; that may mean that handling of NaNs and other invalid values could be added without changing the performance of these functions.

Also note that they do not handle invalid values such as NaNs, overflows, underflows, and infs in a proper way. I think most of the compiler option reshuffling isn't necessary, only the --use-blx --wrap and -u flags are a actually necessary to get it to use these functions instead of GCC's default soft float implementation.

I noticed some minor graphical issues (like 1 frame every few seconds that looks wrong) when using these, but it's hard to tell what causes that, if NaN handling is required or if something else is going wrong. Probably don't add these to the project by default unless that gets tested further. This also should be benchmarked in more detail.

I'm opening this issue so that the discussion about this isn't hidden away in the discord.

Kuratius commented 3 months ago

if ((exponent<=(1<<23))){ This line in div.c should probably be < instead of <= strictly speaking, or just <=0

Kuratius commented 2 months ago

https://github.com/blocksds/libnds/commit/747890e0aec2fb78b01086f1632ae7b97ac2f807 The hw sqrtf has been merged into blocksds and it now also has NaN support.

Hydr8gon / sm64

Hardware accelerated floating point functions #39