MilesCranmer / SymbolicRegression.jl

Distributed High-Performance Symbolic Regression in Julia
https://ai.damtp.cam.ac.uk/symbolicregression/
Apache License 2.0
636 stars 82 forks source link

[Performance] Single evaluation results #73

Closed MilesCranmer closed 2 years ago

MilesCranmer commented 2 years ago

Here is the speed of SymbolicRegression.jl in evaluating a single expression with 48 nodes, over development history since v0.5.0:

v0.5.0   11.709 μs (58 allocations: 18.64 KiB)
v0.5.1   11.625 μs (58 allocations: 18.64 KiB)
v0.5.2   11.750 μs (58 allocations: 19.03 KiB)
v0.5.3   11.792 μs (58 allocations: 19.03 KiB)
v0.5.4   11.792 μs (58 allocations: 19.03 KiB)
v0.5.5   11.834 μs (58 allocations: 19.03 KiB)
v0.5.6   11.750 μs (58 allocations: 19.03 KiB)
v0.5.7   11.625 μs (58 allocations: 19.03 KiB)
v0.5.8   11.708 μs (58 allocations: 19.03 KiB)
v0.5.9   11.958 μs (58 allocations: 19.03 KiB)
v0.5.10   11.666 μs (58 allocations: 18.64 KiB)
v0.5.11   11.625 μs (58 allocations: 18.64 KiB)
v0.5.12   11.625 μs (58 allocations: 18.64 KiB)
v0.5.13   11.791 μs (58 allocations: 19.42 KiB)
v0.5.14   11.833 μs (58 allocations: 19.42 KiB)
v0.5.15   11.708 μs (58 allocations: 19.42 KiB)
v0.5.16   11.750 μs (58 allocations: 19.42 KiB)
v0.6.0   11.500 μs (58 allocations: 19.42 KiB)
v0.6.1   11.750 μs (58 allocations: 19.81 KiB)
v0.6.2   11.666 μs (58 allocations: 19.81 KiB)
v0.6.3   11.666 μs (58 allocations: 19.81 KiB)
v0.6.4   14.583 μs (58 allocations: 19.81 KiB)
v0.6.5   14.583 μs (58 allocations: 19.81 KiB)
v0.6.6   14.375 μs (58 allocations: 19.81 KiB)
v0.6.7   14.625 μs (58 allocations: 19.81 KiB)
v0.6.8   14.542 μs (58 allocations: 19.81 KiB)
v0.6.9   14.625 μs (58 allocations: 19.81 KiB)
v0.6.10   14.416 μs (58 allocations: 19.81 KiB)
v0.7.0   14.500 μs (58 allocations: 20.20 KiB)
v0.7.1   14.625 μs (58 allocations: 20.20 KiB)
v0.7.2   14.583 μs (58 allocations: 20.20 KiB)
v0.7.3   14.458 μs (58 allocations: 20.20 KiB)
v0.7.4   14.541 μs (58 allocations: 20.20 KiB)
v0.7.5   14.625 μs (58 allocations: 20.20 KiB)
v0.7.6   14.417 μs (58 allocations: 20.20 KiB)
v0.7.7   14.458 μs (58 allocations: 20.20 KiB)
v0.7.8   14.458 μs (58 allocations: 20.20 KiB)
v0.7.9   14.541 μs (58 allocations: 20.59 KiB)
v0.7.10   14.416 μs (58 allocations: 21.38 KiB)
v0.7.11   14.500 μs (58 allocations: 21.38 KiB)
v0.7.12   14.458 μs (58 allocations: 21.38 KiB)

As can be seen, a major performance regression happened from 0.6.3 to 0.6.4. The change can be seen here: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.3...v0.6.4.

This was a necessary change to deal with NaNs and Infs, but I'm not sure it should impact performance that badly...

It looks like checking for NaNs/Infs within the SIMD loop is a major issue for the compiler. Will try checking if moving the NaN/Inf checks out of the loop gets a performance improvement or not.

(run this with:

git tag > tags.txt

# (Remove up to v0.5.0)

# Collect data:
for x in $(cat tags.txt); do git checkout $x 2>&1 > /dev/null && echo -n "${x} " && julia --project=. -O3 single_eval.jl; done >> benchmark_results.txt

# Sort and parse (requires vim-stream)
cat benchmark_results.txt | grep -v HEAD | vims -l 'xf.r f.r ' | sort -k1n -k2n -k3n | vims -l 'Iv\<esc>f r.f r.' |vims -l 'fndf '
MilesCranmer commented 2 years ago

Okay sped things up. New time is:

nodes=48   10.917 μs (58 allocations: 21.38 KiB)

which is awesome.

Changes included:

I also tried using LoopVectorization @turbo for the other expression evaluations in EvaluationEquation.jl but it seems to slow them down compared to simply @inbounds @simd - not sure why.

The full filechanges can be viewed here: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/ac19f1f..e1f1127

I experimented with turning off the kernel fusing, but it is actually super important for the speed. So for future speedups it would be good to allow for more complex kernel fusions or output LRU caching.