Currently fsqrt.d is one of the most slow instructions that can be called from userspace, for instance it is roughly 3x slower than other floating point instructions, we could improve it. For dapps running untrusted RISC-V code, this function could be abused to make the dapp validation intentionally slower.
I added other instructions speed as reference. Also sqrt seems to be taking a large number of microarchitecture cycles.
EDIT: Seems like fdiv.d causes a iterations of 128 loops in uarch due to our 128bit implementation, we could also optimize that.
Possible solutions
Our current implementation is using Newton's method to find the square root, with many iterations. Seems like "Berkeley Softfloat" gets away without for loops, using fast invert square root, possible inspired by the famous Quake's fast invert square root. We could investigate how this is done, removing for loops would be the ideal case for running in microarchitecture.
Context
Currently
fsqrt.d
is one of the most slow instructions that can be called from userspace, for instance it is roughly 3x slower than other floating point instructions, we could improve it. For dapps running untrusted RISC-V code, this function could be abused to make the dapp validation intentionally slower.Measurements:
I added other instructions speed as reference. Also sqrt seems to be taking a large number of microarchitecture cycles.
EDIT: Seems like
fdiv.d
causes a iterations of 128 loops in uarch due to our 128bit implementation, we could also optimize that.Possible solutions
Our current implementation is using Newton's method to find the square root, with many iterations. Seems like "Berkeley Softfloat" gets away without for loops, using fast invert square root, possible inspired by the famous Quake's fast invert square root. We could investigate how this is done, removing for loops would be the ideal case for running in microarchitecture.