SymbolicML / DynamicExpressions.jl

Ridiculously fast symbolic expressions
https://symbolicml.org/DynamicExpressions.jl/dev
Apache License 2.0
90 stars 11 forks source link

Native GPU support #65

Open MilesCranmer opened 4 months ago

MilesCranmer commented 4 months ago

This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!

This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).

graphviz

TODO:

github-actions[bot] commented 4 months ago

Benchmark Results

master cb2d055423b415... t[master]/t[cb2d055423b415...]
eval/ComplexF32/evaluation 7.4 ± 0.51 ms 7.42 ± 0.5 ms 0.997
eval/ComplexF64/evaluation 9.63 ± 0.75 ms 9.67 ± 0.67 ms 0.995
eval/Float32/derivative 10.7 ± 1.8 ms 10.8 ± 1.6 ms 0.991
eval/Float32/derivative_turbo 10.8 ± 1.7 ms 10.7 ± 1.6 ms 1.01
eval/Float32/evaluation 2.72 ± 0.23 ms 2.73 ± 0.22 ms 0.997
eval/Float32/evaluation_bumper 0.572 ± 0.013 ms 0.572 ± 0.013 ms 1
eval/Float32/evaluation_turbo 0.699 ± 0.028 ms 0.711 ± 0.031 ms 0.983
eval/Float32/evaluation_turbo_bumper 0.577 ± 0.013 ms 0.572 ± 0.013 ms 1.01
eval/Float64/derivative 14.3 ± 0.63 ms 14.1 ± 0.59 ms 1.02
eval/Float64/derivative_turbo 14.3 ± 0.53 ms 14.1 ± 0.64 ms 1.01
eval/Float64/evaluation 2.91 ± 0.23 ms 2.93 ± 0.25 ms 0.994
eval/Float64/evaluation_bumper 1.31 ± 0.046 ms 1.29 ± 0.044 ms 1.01
eval/Float64/evaluation_turbo 1.18 ± 0.057 ms 1.18 ± 0.063 ms 1.01
eval/Float64/evaluation_turbo_bumper 1.3 ± 0.046 ms 1.29 ± 0.044 ms 1.01
utils/combine_operators/break_sharing 0.0415 ± 0.0013 ms 0.0415 ± 0.0013 ms 1
utils/convert/break_sharing 28.1 ± 0.95 μs 28.3 ± 0.93 μs 0.991
utils/convert/preserve_sharing 0.126 ± 0.0028 ms 0.127 ± 0.0031 ms 0.993
utils/copy/break_sharing 28.8 ± 1 μs 29.1 ± 1 μs 0.993
utils/copy/preserve_sharing 0.126 ± 0.0031 ms 0.127 ± 0.003 ms 0.997
utils/count_constants/break_sharing 10.2 ± 0.15 μs 11.1 ± 0.18 μs 0.916
utils/count_constants/preserve_sharing 0.112 ± 0.0022 ms 0.111 ± 0.0024 ms 1.01
utils/count_depth/break_sharing 17.4 ± 0.38 μs 17.4 ± 0.37 μs 1
utils/count_nodes/break_sharing 9.85 ± 0.15 μs 10.1 ± 0.15 μs 0.971
utils/count_nodes/preserve_sharing 0.114 ± 0.0021 ms 0.114 ± 0.0024 ms 1
utils/get_set_constants!/break_sharing 0.0535 ± 0.00078 ms 0.0522 ± 0.00083 ms 1.03
utils/get_set_constants!/preserve_sharing 0.322 ± 0.0061 ms 0.322 ± 0.007 ms 1
utils/has_constants/break_sharing 4.69 ± 0.22 μs 4.44 ± 0.21 μs 1.05
utils/has_operators/break_sharing 2.09 ± 0.021 μs 1.93 ± 0.019 μs 1.08
utils/hash/break_sharing 30.1 ± 0.45 μs 30 ± 0.46 μs 1
utils/hash/preserve_sharing 0.132 ± 0.0024 ms 0.132 ± 0.0026 ms 1
utils/index_constants/break_sharing 27.4 ± 0.73 μs 27.8 ± 0.79 μs 0.986
utils/index_constants/preserve_sharing 0.127 ± 0.0029 ms 0.126 ± 0.0028 ms 1.01
utils/is_constant/break_sharing 4.46 ± 0.22 μs 4.42 ± 0.22 μs 1.01
utils/simplify_tree/break_sharing 0.171 ± 0.015 ms 0.179 ± 0.016 ms 0.954
utils/simplify_tree/preserve_sharing 0.293 ± 0.017 ms 0.289 ± 0.017 ms 1.02
utils/string_tree/break_sharing 0.496 ± 0.012 ms 0.498 ± 0.013 ms 0.997
utils/string_tree/preserve_sharing 0.639 ± 0.016 ms 0.641 ± 0.017 ms 0.997
time_to_load 0.191 ± 0.0023 s 0.201 ± 0.0033 s 0.95
coveralls commented 3 months ago

Pull Request Test Coverage Report for Build 8042273246

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicExpressionsCUDAExt.jl 78 80 97.5%
<!-- Total: 135 137 98.54% -->
Totals Coverage Status
Change from base Build 7996123220: 0.3%
Covered Lines: 1754
Relevant Lines: 1847

💛 - Coveralls