the main reason the original is so slow, even after using the correct compiler flags, is the extra branching that isn't present in the cpp version. it was easier for me just to rewrite the whole thing rather than trying to make a light diff, but the results speak for themselves:
without the extra math optimizations i couldn't help putting in (-d:intpow -d:quake --passC:"-march=native") i get 55ms vs 74 for c++, and with them that goes down to 37ms
the main reason the original is so slow, even after using the correct compiler flags, is the extra branching that isn't present in the cpp version. it was easier for me just to rewrite the whole thing rather than trying to make a light diff, but the results speak for themselves: without the extra math optimizations i couldn't help putting in (-d:intpow -d:quake --passC:"-march=native") i get 55ms vs 74 for c++, and with them that goes down to 37ms