Closed sunxd3 closed 5 months ago
Example Name | Category | Median Time | Minimum Time | Maximum Time | Memory Usage |
---|---|---|---|---|---|
surgical_realistic | AD logdensity_and_gradient | 191.465 μs | 174.143 μs | 10.651 ms | 93.25 KiB |
AD compiled logdensity_and_gradient | 9.678 μs | 9.448 μs | 30.316 μs | 208 bytes | |
logdensity | 95.087 μs | 88.273 μs | 8.033 ms | 32.88 KiB | |
AD logdensity | 94.811 μs | 87.883 μs | 7.995 ms | 32.88 KiB | |
AD compiled logdensity | 94.916 μs | 87.973 μs | 8.169 ms | 32.88 KiB | |
pumps | AD logdensity_and_gradient | 150.499 μs | 137.154 μs | 13.024 ms | 66.42 KiB |
AD compiled logdensity_and_gradient | 6.845 μs | 6.760 μs | 15.655 μs | 192 bytes | |
logdensity | 79.347 μs | 71.563 μs | 9.439 ms | 25.08 KiB | |
AD logdensity | 78.896 μs | 71.984 μs | 8.696 ms | 25.08 KiB | |
AD compiled logdensity | 78.967 μs | 71.914 μs | 8.704 ms | 25.08 KiB | |
dogs | AD logdensity_and_gradient | 3.299 ms | 3.193 ms | 15.532 ms | 1.90 MiB |
AD compiled logdensity_and_gradient | 175.440 μs | 167.370 μs | 888.628 μs | 112 bytes | |
logdensity | 2.536 ms | 2.451 ms | 12.721 ms | 1.15 MiB | |
AD logdensity | 2.547 ms | 2.457 ms | 19.634 ms | 1.15 MiB | |
AD compiled logdensity | 2.541 ms | 2.450 ms | 12.816 ms | 1.15 MiB | |
magnesium | AD logdensity_and_gradient | 3.106 ms | 2.963 ms | 17.369 ms | 1.51 MiB |
AD compiled logdensity_and_gradient | 98.031 μs | 96.499 μs | 149.717 μs | 1.02 KiB | |
logdensity | 1.525 ms | 1.466 ms | 16.078 ms | 783.89 KiB | |
AD logdensity | 1.527 ms | 1.480 ms | 11.304 ms | 783.89 KiB | |
AD compiled logdensity | 1.526 ms | 1.471 ms | 10.921 ms | 783.89 KiB | |
surgical_simple | AD logdensity_and_gradient | 117.037 μs | 113.231 μs | 8.303 ms | 87.03 KiB |
AD compiled logdensity_and_gradient | 9.427 μs | 9.227 μs | 31.318 μs | 192 bytes | |
logdensity | 48.179 μs | 46.497 μs | 8.791 ms | 17.41 KiB | |
AD logdensity | 48.575 μs | 46.516 μs | 9.057 ms | 17.41 KiB | |
AD compiled logdensity | 48.620 μs | 46.586 μs | 8.628 ms | 17.41 KiB | |
salm | AD logdensity_and_gradient | 339.646 μs | 322.579 μs | 9.951 ms | 147.38 KiB |
AD compiled logdensity_and_gradient | 15.830 μs | 15.308 μs | 56.365 μs | 272 bytes | |
logdensity | 208.617 μs | 199.651 μs | 10.179 ms | 69.78 KiB | |
AD logdensity | 208.837 μs | 199.840 μs | 10.221 ms | 69.78 KiB | |
AD compiled logdensity | 208.938 μs | 199.380 μs | 10.360 ms | 69.78 KiB | |
stacks | AD logdensity_and_gradient | 390.259 μs | 374.184 μs | 10.611 ms | 169.33 KiB |
AD compiled logdensity_and_gradient | 15.449 μs | 14.948 μs | 37.640 μs | 144 bytes | |
logdensity | 260.924 μs | 243.842 μs | 10.425 ms | 84.17 KiB | |
AD logdensity | 260.704 μs | 242.981 μs | 10.552 ms | 84.17 KiB | |
AD compiled logdensity | 260.724 μs | 242.309 μs | 10.495 ms | 84.17 KiB | |
bones | AD logdensity_and_gradient | 6.871 ms | 6.563 ms | 17.753 ms | 3.45 MiB |
AD compiled logdensity_and_gradient | 201.033 μs | 192.077 μs | 313.251 μs | 368 bytes | |
logdensity | 5.596 ms | 5.438 ms | 15.979 ms | 2.20 MiB | |
AD logdensity | 5.581 ms | 5.440 ms | 15.655 ms | 2.20 MiB | |
AD compiled logdensity | 5.569 ms | 5.429 ms | 15.774 ms | 2.20 MiB | |
leukfr | AD logdensity_and_gradient | 5.968 ms | 5.808 ms | 22.971 ms | 4.10 MiB |
AD compiled logdensity_and_gradient | 269.971 μs | 260.854 μs | 471.497 μs | 432 bytes | |
logdensity | 3.890 ms | 3.738 ms | 14.788 ms | 2.75 MiB | |
AD logdensity | 3.885 ms | 3.753 ms | 15.391 ms | 2.75 MiB | |
AD compiled logdensity | 3.900 ms | 3.756 ms | 14.705 ms | 2.75 MiB | |
lsat | AD logdensity_and_gradient | 185.747 ms | 158.125 ms | 207.403 ms | 174.23 MiB |
AD compiled logdensity_and_gradient | 2.195 ms | 1.787 ms | 4.517 ms | 8.03 KiB | |
logdensity | 152.901 ms | 144.593 ms | 163.197 ms | 168.65 MiB | |
AD logdensity | 156.358 ms | 147.681 ms | 166.945 ms | 168.65 MiB | |
AD compiled logdensity | 154.701 ms | 145.677 ms | 163.956 ms | 168.65 MiB | |
seeds | AD logdensity_and_gradient | 465.860 μs | 443.553 μs | 10.277 ms | 210.06 KiB |
AD compiled logdensity_and_gradient | 25.607 μs | 24.906 μs | 51.235 μs | 304 bytes | |
logdensity | 274.139 μs | 250.494 μs | 11.979 ms | 85.39 KiB | |
AD logdensity | 276.113 μs | 251.316 μs | 11.793 ms | 85.39 KiB | |
AD compiled logdensity | 275.677 μs | 250.615 μs | 11.987 ms | 85.39 KiB | |
blockers | AD logdensity_and_gradient | 803.912 μs | 769.017 μs | 10.333 ms | 352.06 KiB |
AD compiled logdensity_and_gradient | 32.501 μs | 31.207 μs | 67.525 μs | 480 bytes | |
logdensity | 540.643 μs | 512.751 μs | 11.238 ms | 150.11 KiB | |
AD logdensity | 541.224 μs | 511.769 μs | 14.029 ms | 150.11 KiB | |
AD compiled logdensity | 541.084 μs | 510.548 μs | 11.308 ms | 150.11 KiB | |
equiv | AD logdensity_and_gradient | 337.256 μs | 325.313 μs | 10.015 ms | 162.06 KiB |
AD compiled logdensity_and_gradient | 16.251 μs | 15.869 μs | 41.217 μs | 208 bytes | |
logdensity | 212.264 μs | 202.386 μs | 10.587 ms | 78.14 KiB | |
AD logdensity | 211.012 μs | 202.356 μs | 10.972 ms | 78.14 KiB | |
AD compiled logdensity | 210.912 μs | 202.646 μs | 10.859 ms | 78.14 KiB | |
rats | AD logdensity_and_gradient | 2.082 ms | 2.013 ms | 14.656 ms | 1.11 MiB |
AD compiled logdensity_and_gradient | 103.342 μs | 99.024 μs | 163.864 μs | 608 bytes | |
logdensity | 1.469 ms | 1.426 ms | 11.697 ms | 697.08 KiB | |
AD logdensity | 1.468 ms | 1.426 ms | 15.999 ms | 697.08 KiB | |
AD compiled logdensity | 1.465 ms | 1.424 ms | 11.814 ms | 697.08 KiB | |
mice | AD logdensity_and_gradient | 751.395 μs | 713.344 μs | 10.778 ms | 376.92 KiB |
AD compiled logdensity_and_gradient | 41.327 μs | 40.426 μs | 67.745 μs | 256 bytes | |
logdensity | 227.682 μs | 218.114 μs | 10.123 ms | 129.52 KiB | |
AD logdensity | 228.458 μs | 218.896 μs | 10.588 ms | 129.52 KiB | |
AD compiled logdensity | 228.083 μs | 218.325 μs | 10.569 ms | 129.52 KiB | |
leuk | AD logdensity_and_gradient | 4.747 ms | 4.562 ms | 17.405 ms | 2.92 MiB |
AD compiled logdensity_and_gradient | 204.018 μs | 201.854 μs | 280.090 μs | 240 bytes | |
logdensity | 2.947 ms | 2.760 ms | 13.897 ms | 1.72 MiB | |
AD logdensity | 2.950 ms | 2.784 ms | 13.572 ms | 1.72 MiB | |
AD compiled logdensity | 2.949 ms | 2.772 ms | 13.284 ms | 1.72 MiB | |
oxford | AD logdensity_and_gradient | 6.669 ms | 6.400 ms | 18.361 ms | 3.17 MiB |
AD compiled logdensity_and_gradient | 242.570 μs | 239.494 μs | 371.439 μs | 2.02 KiB | |
logdensity | 5.242 ms | 4.823 ms | 17.679 ms | 2.03 MiB | |
AD logdensity | 5.155 ms | 4.820 ms | 17.632 ms | 2.03 MiB | |
AD compiled logdensity | 5.134 ms | 4.826 ms | 17.410 ms | 2.03 MiB | |
epil | AD logdensity_and_gradient | 10.638 ms | 10.143 ms | 24.348 ms | 7.75 MiB |
AD compiled logdensity_and_gradient | 354.117 μs | 341.574 μs | 636.572 μs | 4.41 KiB | |
logdensity | 8.027 ms | 7.824 ms | 19.025 ms | 6.14 MiB | |
AD logdensity | 7.989 ms | 7.790 ms | 18.108 ms | 6.14 MiB | |
AD compiled logdensity | 7.979 ms | 7.791 ms | 22.664 ms | 6.14 MiB | |
dyes | AD logdensity_and_gradient | 271.444 μs | 261.526 μs | 9.889 ms | 141.08 KiB |
AD compiled logdensity_and_gradient | 20.799 μs | 20.007 μs | 58.759 μs | 400 bytes | |
logdensity | 154.376 μs | 148.025 μs | 10.537 ms | 60.53 KiB | |
AD logdensity | 154.392 μs | 148.165 μs | 10.489 ms | 60.53 KiB | |
AD compiled logdensity | 154.387 μs | 148.135 μs | 11.174 ms | 60.53 KiB | |
kidney | AD logdensity_and_gradient | 1.791 ms | 1.715 ms | 10.905 ms | 854.17 KiB |
AD compiled logdensity_and_gradient | 77.303 μs | 76.382 μs | 149.968 μs | 608 bytes | |
logdensity | 923.003 μs | 895.933 μs | 10.784 ms | 422.08 KiB | |
AD logdensity | 923.063 μs | 894.881 μs | 15.369 ms | 422.08 KiB | |
AD compiled logdensity | 922.051 μs | 893.078 μs | 10.742 ms | 422.08 KiB |
Compare to before, the speed is marginally faster without ReverseDiff compiled tape.
ReverseDiff's compiled tape allocates much less memory thanks to its various linear algebra optimisations, which might explain the performance difference. Such runtime difference matters more for small models or models with many scalar operations but might be less critical for models with extensive deterministic computations like GPs and DiffEqs.
Overall, getting rid of bugs_eval
is a good improvement! Thanks @sunxd3
That's true.
Also the models in Examples are quite simple, so the node function execution was not the performance bottleneck. But still there are some improvement.
Pull Request Test Coverage Report for Build 8706317666
Details
💛 - Coveralls