Open claudeha opened 6 years ago
the branching logic should be roughly only a few nanoseconds overhead tops, maybe 20-30ns? probably depends on how the branches get laid out, and i do think perf sensitive math which takes under 100ns will likely not be a user of rounded? branch prediction will likely help, and if we could arrange the check to have the initial "unsafe" route be the pipelined one, the overhead of doing the safe style call would dominate that branch prediction failure (at least when branch prediction isn't firing correclty)
so i think the overhead of the compare against estimated flops count and branch which ffi call shouldn't be that bad, unless somehow instruction cache stuff differs somehow? theres probably some good experiments to be done here
Unsafe FFI is faster but calls longer than about 10µs should be avoided as they can block GC, stopping the world in the threaded runtime.
The
Numeric.MPFR.Raw
module currently reexportsNumeric.MPFR.Raw.Safe
with safe ccall imports, but it could estimate the cost at runtime (using functions of precision as heuristics) and do an unsafe call if it's likely to complete in time. The heuristics must be cheaper than the overheads of a safe FFI call (which is how much?), otherwise it would be better to just do safe FFI calls all the time.