Open MilesCranmer opened 9 months ago
master | 9f49619e658053... | master/9f49619e658053... | |
---|---|---|---|
eval/ComplexF32/evaluation | 7.48 ± 0.48 ms | 7.54 ± 0.44 ms | 0.993 |
eval/ComplexF64/evaluation | 9.83 ± 0.73 ms | 9.85 ± 0.74 ms | 0.998 |
eval/Float32/derivative | 10.9 ± 1.8 ms | 10.9 ± 2 ms | 1 |
eval/Float32/derivative_turbo | 11.3 ± 2.5 ms | 11.1 ± 2.4 ms | 1.01 |
eval/Float32/evaluation | 2.78 ± 0.21 ms | 2.8 ± 0.22 ms | 0.993 |
eval/Float32/evaluation_bumper | 0.582 ± 0.015 ms | 0.588 ± 0.015 ms | 0.99 |
eval/Float32/evaluation_turbo | 0.721 ± 0.038 ms | 0.718 ± 0.038 ms | 1 |
eval/Float32/evaluation_turbo_bumper | 0.581 ± 0.013 ms | 0.586 ± 0.015 ms | 0.991 |
eval/Float64/derivative | 14.8 ± 0.9 ms | 15.2 ± 1.1 ms | 0.973 |
eval/Float64/derivative_turbo | 15 ± 0.84 ms | 15.3 ± 1 ms | 0.985 |
eval/Float64/evaluation | 2.99 ± 0.25 ms | 2.99 ± 0.24 ms | 0.999 |
eval/Float64/evaluation_bumper | 1.3 ± 0.046 ms | 1.3 ± 0.046 ms | 1 |
eval/Float64/evaluation_turbo | 1.25 ± 0.073 ms | 1.24 ± 0.075 ms | 1.01 |
eval/Float64/evaluation_turbo_bumper | 1.31 ± 0.048 ms | 1.29 ± 0.047 ms | 1.01 |
utils/combine_operators/break_sharing | 0.0387 ± 0.00066 ms | 0.0394 ± 0.00075 ms | 0.984 |
utils/convert/break_sharing | 23.3 ± 1.1 μs | 22.9 ± 1.1 μs | 1.02 |
utils/convert/preserve_sharing | 0.125 ± 0.0035 ms | 0.125 ± 0.0035 ms | 1 |
utils/copy/break_sharing | 23.8 ± 1.1 μs | 23.7 ± 1.1 μs | 1.01 |
utils/copy/preserve_sharing | 0.128 ± 0.0043 ms | 0.127 ± 0.0043 ms | 1.01 |
utils/count_constant_nodes/break_sharing | 9.05 ± 0.13 μs | 9.07 ± 0.14 μs | 0.998 |
utils/count_constant_nodes/preserve_sharing | 0.111 ± 0.0028 ms | 0.112 ± 0.0034 ms | 0.988 |
utils/count_depth/break_sharing | 12.9 ± 0.33 μs | 13.2 ± 0.36 μs | 0.976 |
utils/count_nodes/break_sharing | 8.41 ± 0.12 μs | 8.39 ± 0.13 μs | 1 |
utils/count_nodes/preserve_sharing | 0.111 ± 0.0032 ms | 0.114 ± 0.004 ms | 0.974 |
utils/get_set_constants!/break_sharing | 0.0349 ± 0.0014 ms | 0.0345 ± 0.0015 ms | 1.01 |
utils/get_set_constants!/preserve_sharing | 0.226 ± 0.0059 ms | 0.231 ± 0.0064 ms | 0.979 |
utils/get_set_constants_parametric | 0.0496 ± 0.0026 ms | 0.0489 ± 0.0026 ms | 1.01 |
utils/has_constants/break_sharing | 4.27 ± 0.066 μs | 4.18 ± 0.065 μs | 1.02 |
utils/has_operators/break_sharing | 1.96 ± 0.036 μs | 1.94 ± 0.024 μs | 1.01 |
utils/hash/break_sharing | 25.4 ± 0.54 μs | 25.5 ± 0.48 μs | 0.999 |
utils/hash/preserve_sharing | 0.133 ± 0.004 ms | 0.136 ± 0.0042 ms | 0.982 |
utils/index_constant_nodes/break_sharing | 22.7 ± 0.79 μs | 23.3 ± 0.8 μs | 0.975 |
utils/index_constant_nodes/preserve_sharing | 0.127 ± 0.0042 ms | 0.128 ± 0.004 ms | 0.988 |
utils/is_constant/break_sharing | 4.18 ± 0.07 μs | 4.13 ± 0.06 μs | 1.01 |
utils/simplify_tree/break_sharing | 0.168 ± 0.002 ms | 0.174 ± 0.0016 ms | 0.967 |
utils/simplify_tree/preserve_sharing | 0.295 ± 0.0059 ms | 0.29 ± 0.0061 ms | 1.02 |
utils/string_tree/break_sharing | 0.407 ± 0.025 ms | 0.407 ± 0.018 ms | 1 |
utils/string_tree/preserve_sharing | 0.545 ± 0.027 ms | 0.547 ± 0.021 ms | 0.998 |
time_to_load | 0.223 ± 0.0015 s | 0.235 ± 0.0028 s | 0.948 |
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
ext/DynamicExpressionsCUDAExt.jl | 78 | 80 | 97.5% | ||
<!-- | Total: | 135 | 137 | 98.54% | --> |
Totals | |
---|---|
Change from base Build 7996123220: | 0.3% |
Covered Lines: | 1754 |
Relevant Lines: | 1847 |
This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!
This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).
TODO:
CUDA.@captured
helps at al@sync
anywhere