SymbolicML / DynamicExpressions.jl

Ridiculously fast symbolic expressions
https://ai.damtp.cam.ac.uk/dynamicexpressions
Apache License 2.0
106 stars 15 forks source link

Native GPU support #65

Open MilesCranmer opened 9 months ago

MilesCranmer commented 9 months ago

This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!

This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).

graphviz

TODO:

github-actions[bot] commented 9 months ago

Benchmark Results

master 9f49619e658053... master/9f49619e658053...
eval/ComplexF32/evaluation 7.48 ± 0.48 ms 7.54 ± 0.44 ms 0.993
eval/ComplexF64/evaluation 9.83 ± 0.73 ms 9.85 ± 0.74 ms 0.998
eval/Float32/derivative 10.9 ± 1.8 ms 10.9 ± 2 ms 1
eval/Float32/derivative_turbo 11.3 ± 2.5 ms 11.1 ± 2.4 ms 1.01
eval/Float32/evaluation 2.78 ± 0.21 ms 2.8 ± 0.22 ms 0.993
eval/Float32/evaluation_bumper 0.582 ± 0.015 ms 0.588 ± 0.015 ms 0.99
eval/Float32/evaluation_turbo 0.721 ± 0.038 ms 0.718 ± 0.038 ms 1
eval/Float32/evaluation_turbo_bumper 0.581 ± 0.013 ms 0.586 ± 0.015 ms 0.991
eval/Float64/derivative 14.8 ± 0.9 ms 15.2 ± 1.1 ms 0.973
eval/Float64/derivative_turbo 15 ± 0.84 ms 15.3 ± 1 ms 0.985
eval/Float64/evaluation 2.99 ± 0.25 ms 2.99 ± 0.24 ms 0.999
eval/Float64/evaluation_bumper 1.3 ± 0.046 ms 1.3 ± 0.046 ms 1
eval/Float64/evaluation_turbo 1.25 ± 0.073 ms 1.24 ± 0.075 ms 1.01
eval/Float64/evaluation_turbo_bumper 1.31 ± 0.048 ms 1.29 ± 0.047 ms 1.01
utils/combine_operators/break_sharing 0.0387 ± 0.00066 ms 0.0394 ± 0.00075 ms 0.984
utils/convert/break_sharing 23.3 ± 1.1 μs 22.9 ± 1.1 μs 1.02
utils/convert/preserve_sharing 0.125 ± 0.0035 ms 0.125 ± 0.0035 ms 1
utils/copy/break_sharing 23.8 ± 1.1 μs 23.7 ± 1.1 μs 1.01
utils/copy/preserve_sharing 0.128 ± 0.0043 ms 0.127 ± 0.0043 ms 1.01
utils/count_constant_nodes/break_sharing 9.05 ± 0.13 μs 9.07 ± 0.14 μs 0.998
utils/count_constant_nodes/preserve_sharing 0.111 ± 0.0028 ms 0.112 ± 0.0034 ms 0.988
utils/count_depth/break_sharing 12.9 ± 0.33 μs 13.2 ± 0.36 μs 0.976
utils/count_nodes/break_sharing 8.41 ± 0.12 μs 8.39 ± 0.13 μs 1
utils/count_nodes/preserve_sharing 0.111 ± 0.0032 ms 0.114 ± 0.004 ms 0.974
utils/get_set_constants!/break_sharing 0.0349 ± 0.0014 ms 0.0345 ± 0.0015 ms 1.01
utils/get_set_constants!/preserve_sharing 0.226 ± 0.0059 ms 0.231 ± 0.0064 ms 0.979
utils/get_set_constants_parametric 0.0496 ± 0.0026 ms 0.0489 ± 0.0026 ms 1.01
utils/has_constants/break_sharing 4.27 ± 0.066 μs 4.18 ± 0.065 μs 1.02
utils/has_operators/break_sharing 1.96 ± 0.036 μs 1.94 ± 0.024 μs 1.01
utils/hash/break_sharing 25.4 ± 0.54 μs 25.5 ± 0.48 μs 0.999
utils/hash/preserve_sharing 0.133 ± 0.004 ms 0.136 ± 0.0042 ms 0.982
utils/index_constant_nodes/break_sharing 22.7 ± 0.79 μs 23.3 ± 0.8 μs 0.975
utils/index_constant_nodes/preserve_sharing 0.127 ± 0.0042 ms 0.128 ± 0.004 ms 0.988
utils/is_constant/break_sharing 4.18 ± 0.07 μs 4.13 ± 0.06 μs 1.01
utils/simplify_tree/break_sharing 0.168 ± 0.002 ms 0.174 ± 0.0016 ms 0.967
utils/simplify_tree/preserve_sharing 0.295 ± 0.0059 ms 0.29 ± 0.0061 ms 1.02
utils/string_tree/break_sharing 0.407 ± 0.025 ms 0.407 ± 0.018 ms 1
utils/string_tree/preserve_sharing 0.545 ± 0.027 ms 0.547 ± 0.021 ms 0.998
time_to_load 0.223 ± 0.0015 s 0.235 ± 0.0028 s 0.948
coveralls commented 8 months ago

Pull Request Test Coverage Report for Build 8042273246

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicExpressionsCUDAExt.jl 78 80 97.5%
<!-- Total: 135 137 98.54% -->
Totals Coverage Status
Change from base Build 7996123220: 0.3%
Covered Lines: 1754
Relevant Lines: 1847

💛 - Coveralls