This PR moves shim function generation to lowering time, this limits the shim functions to only
those used in kernels. Expected to reduce I/O and RTC time.
This PR also adds a simulated occasion to generate N functions for Numbast to benchmark optimization. The benchmark shows that with ~100 function declarations, this PR reduces the best case kernel launch time time by 40%
Closes #15
Benchmark data
bench-a is this branch, bench-m is main.
------------------------------------------------------------------------------- benchmark 'test_rtc[1000]': 2 tests --------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[1000] (0001_bench-a) 389.9006 (1.0) 401.2640 (1.0) 396.4518 (1.0) 5.5378 (1.0) 399.8293 (1.0) 9.8640 (1.98) 1;0 2.5224 (1.0) 5 1
test_rtc[1000] (0002_bench-m) 863.5706 (2.21) 879.0632 (2.19) 867.9739 (2.19) 6.2954 (1.14) 866.1069 (2.17) 4.9712 (1.0) 1;1 1.1521 (0.46) 5 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------- benchmark 'test_rtc[100]': 2 tests --------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[100] (0001_bench-a) 67.4049 (1.0) 93.5881 (1.0) 73.1375 (1.0) 7.8246 (1.96) 68.5189 (1.0) 9.1765 (3.11) 1;1 13.6729 (1.0) 12 1
test_rtc[100] (0002_bench-m) 114.4357 (1.70) 123.5623 (1.32) 116.4549 (1.59) 3.9909 (1.0) 114.5094 (1.67) 2.9486 (1.0) 1;1 8.5870 (0.63) 5 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------ benchmark 'test_rtc[10]': 2 tests ------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[10] (0001_bench-a) 30.6027 (1.0) 45.7817 (1.27) 32.1638 (1.0) 3.3724 (9.30) 30.7253 (1.0) 0.5989 (1.40) 4;5 31.0909 (1.0) 25 1
test_rtc[10] (0002_bench-m) 34.7538 (1.14) 36.1526 (1.0) 35.0487 (1.09) 0.3625 (1.0) 34.8931 (1.14) 0.4279 (1.0) 2;1 28.5317 (0.92) 17 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------ benchmark 'test_rtc[1]': 2 tests ------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[1] (0001_bench-a) 26.9139 (1.0) 27.8245 (1.0) 27.4018 (1.01) 0.3614 (1.0) 27.5244 (1.01) 0.6815 (1.23) 5;0 36.4939 (0.99) 10 1
test_rtc[1] (0002_bench-m) 26.9212 (1.00) 28.0164 (1.01) 27.2604 (1.0) 0.3704 (1.02) 27.1367 (1.0) 0.5529 (1.0) 2;0 36.6833 (1.0) 10 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This PR moves shim function generation to lowering time, this limits the shim functions to only those used in kernels. Expected to reduce I/O and RTC time.
This PR also adds a simulated occasion to generate N functions for Numbast to benchmark optimization. The benchmark shows that with ~100 function declarations, this PR reduces the best case kernel launch time time by 40%
Closes #15
Benchmark data
bench-a
is this branch,bench-m
is main.