Open MilesCranmer opened 4 months ago
master | cb2d055423b415... | t[master]/t[cb2d055423b415...] | |
---|---|---|---|
eval/ComplexF32/evaluation | 7.4 ± 0.51 ms | 7.42 ± 0.5 ms | 0.997 |
eval/ComplexF64/evaluation | 9.63 ± 0.75 ms | 9.67 ± 0.67 ms | 0.995 |
eval/Float32/derivative | 10.7 ± 1.8 ms | 10.8 ± 1.6 ms | 0.991 |
eval/Float32/derivative_turbo | 10.8 ± 1.7 ms | 10.7 ± 1.6 ms | 1.01 |
eval/Float32/evaluation | 2.72 ± 0.23 ms | 2.73 ± 0.22 ms | 0.997 |
eval/Float32/evaluation_bumper | 0.572 ± 0.013 ms | 0.572 ± 0.013 ms | 1 |
eval/Float32/evaluation_turbo | 0.699 ± 0.028 ms | 0.711 ± 0.031 ms | 0.983 |
eval/Float32/evaluation_turbo_bumper | 0.577 ± 0.013 ms | 0.572 ± 0.013 ms | 1.01 |
eval/Float64/derivative | 14.3 ± 0.63 ms | 14.1 ± 0.59 ms | 1.02 |
eval/Float64/derivative_turbo | 14.3 ± 0.53 ms | 14.1 ± 0.64 ms | 1.01 |
eval/Float64/evaluation | 2.91 ± 0.23 ms | 2.93 ± 0.25 ms | 0.994 |
eval/Float64/evaluation_bumper | 1.31 ± 0.046 ms | 1.29 ± 0.044 ms | 1.01 |
eval/Float64/evaluation_turbo | 1.18 ± 0.057 ms | 1.18 ± 0.063 ms | 1.01 |
eval/Float64/evaluation_turbo_bumper | 1.3 ± 0.046 ms | 1.29 ± 0.044 ms | 1.01 |
utils/combine_operators/break_sharing | 0.0415 ± 0.0013 ms | 0.0415 ± 0.0013 ms | 1 |
utils/convert/break_sharing | 28.1 ± 0.95 μs | 28.3 ± 0.93 μs | 0.991 |
utils/convert/preserve_sharing | 0.126 ± 0.0028 ms | 0.127 ± 0.0031 ms | 0.993 |
utils/copy/break_sharing | 28.8 ± 1 μs | 29.1 ± 1 μs | 0.993 |
utils/copy/preserve_sharing | 0.126 ± 0.0031 ms | 0.127 ± 0.003 ms | 0.997 |
utils/count_constants/break_sharing | 10.2 ± 0.15 μs | 11.1 ± 0.18 μs | 0.916 |
utils/count_constants/preserve_sharing | 0.112 ± 0.0022 ms | 0.111 ± 0.0024 ms | 1.01 |
utils/count_depth/break_sharing | 17.4 ± 0.38 μs | 17.4 ± 0.37 μs | 1 |
utils/count_nodes/break_sharing | 9.85 ± 0.15 μs | 10.1 ± 0.15 μs | 0.971 |
utils/count_nodes/preserve_sharing | 0.114 ± 0.0021 ms | 0.114 ± 0.0024 ms | 1 |
utils/get_set_constants!/break_sharing | 0.0535 ± 0.00078 ms | 0.0522 ± 0.00083 ms | 1.03 |
utils/get_set_constants!/preserve_sharing | 0.322 ± 0.0061 ms | 0.322 ± 0.007 ms | 1 |
utils/has_constants/break_sharing | 4.69 ± 0.22 μs | 4.44 ± 0.21 μs | 1.05 |
utils/has_operators/break_sharing | 2.09 ± 0.021 μs | 1.93 ± 0.019 μs | 1.08 |
utils/hash/break_sharing | 30.1 ± 0.45 μs | 30 ± 0.46 μs | 1 |
utils/hash/preserve_sharing | 0.132 ± 0.0024 ms | 0.132 ± 0.0026 ms | 1 |
utils/index_constants/break_sharing | 27.4 ± 0.73 μs | 27.8 ± 0.79 μs | 0.986 |
utils/index_constants/preserve_sharing | 0.127 ± 0.0029 ms | 0.126 ± 0.0028 ms | 1.01 |
utils/is_constant/break_sharing | 4.46 ± 0.22 μs | 4.42 ± 0.22 μs | 1.01 |
utils/simplify_tree/break_sharing | 0.171 ± 0.015 ms | 0.179 ± 0.016 ms | 0.954 |
utils/simplify_tree/preserve_sharing | 0.293 ± 0.017 ms | 0.289 ± 0.017 ms | 1.02 |
utils/string_tree/break_sharing | 0.496 ± 0.012 ms | 0.498 ± 0.013 ms | 0.997 |
utils/string_tree/preserve_sharing | 0.639 ± 0.016 ms | 0.641 ± 0.017 ms | 0.997 |
time_to_load | 0.191 ± 0.0023 s | 0.201 ± 0.0033 s | 0.95 |
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
ext/DynamicExpressionsCUDAExt.jl | 78 | 80 | 97.5% | ||
<!-- | Total: | 135 | 137 | 98.54% | --> |
Totals | |
---|---|
Change from base Build 7996123220: | 0.3% |
Covered Lines: | 1754 |
Relevant Lines: | 1847 |
This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!
This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).
TODO:
CUDA.@captured
helps at al@sync
anywhere