Closed MilesCranmer closed 2 years ago
Okay sped things up. New time is:
nodes=48 10.917 μs (58 allocations: 21.38 KiB)
which is awesome.
Changes included:
I also tried using LoopVectorization @turbo
for the other expression evaluations in EvaluationEquation.jl
but it seems to slow them down compared to simply @inbounds @simd
- not sure why.
The full filechanges can be viewed here: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/ac19f1f..e1f1127
I experimented with turning off the kernel fusing, but it is actually super important for the speed. So for future speedups it would be good to allow for more complex kernel fusions or output LRU caching.
Here is the speed of SymbolicRegression.jl in evaluating a single expression with 48 nodes, over development history since v0.5.0:
As can be seen, a major performance regression happened from 0.6.3 to 0.6.4. The change can be seen here: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.3...v0.6.4.
This was a necessary change to deal with NaNs and Infs, but I'm not sure it should impact performance that badly...
It looks like checking for NaNs/Infs within the SIMD loop is a major issue for the compiler. Will try checking if moving the NaN/Inf checks out of the loop gets a performance improvement or not.
(run this with: