I did a full benchmark test of many different RK4 techniques, using these techniques. The results are in: measuring the median time taken (in ns) to map N points using the classical Runge Kutta method
The implementations that beat the standard, serial method are:
manually vectorized, inplace, ~ 1.8x speedup
ILP with @turbo, unrolled function (normal and inplace), ~ 2.5x speedup
I did a full benchmark test of many different RK4 techniques, using these techniques. The results are in: measuring the median time taken (in ns) to map N points using the classical Runge Kutta method
The implementations that beat the standard, serial method are: