Closed JBorrow closed 5 years ago
Well, because I did not think about that before. Could you please confirm that this makes py-sphviewer a factor of 5 faster? I will be glad to add this as a default compiler flag.
It provides a 5x speedup on the following benchmark for me:
from numpy import ones_like, array, float32, zeros
from numpy.random import rand, seed
from time import time
number_of_particles = 100_000
res = 1024
seed(1234)
print("Generating particles")
x = rand(number_of_particles).astype(float32)
y = rand(number_of_particles).astype(float32)
h = rand(number_of_particles).astype(float32) * 0.2
m = ones_like(h)
print("Finished generating particles")
from sphviewer.tools import QuickView
print("Running pySPHViewer")
coordinates = zeros((number_of_particles, 3))
coordinates[:, 0] = x
coordinates[:, 1] = y
h = 1.778_002 * h # The kernel_gamma we use.
t = time()
qv = QuickView(
coordinates,
hsml=h,
mass=m,
xsize=res,
ysize=res,
r="infinity",
plot=False,
logscale=False,
).get_image()
print(f"pySPHViewer took {time() - t} on this problem.")
We may want to also take a look at the kernel function implementation, as far as I can see the distance is square rooted only to be squared again, and replacing pow(x, 3)
with x * x * x
should also confer some speed-up.
@JBorrow be careful though. --fast-math
behaves differently on different compilers. It also allows some unsafe optimisations and vioaliation of the normal IEEE-754 floating point standard. It may not always be a safe thing to do.
@JBorrow: given what @MatthieuSchaller says, I would like to look at the ratio of two images, one created with --fast-math and the other without that. I guess that for visualization it does not matter much how predictive is the answer, but I would like to see that the differences are really small.
I checked the code and the answer doesn't change, and I get a speedup of a factor of 3! Thanks, @JBorrow. The flag will be used by default.
Using
-ffast-math
in the extra compile arguments speeds up the code by about 5x; is there any reason why this isn't used?