NVIDIA / numbast

Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
Apache License 2.0
15 stars 6 forks source link

Generate shim functions at lowering time #44

Closed isVoid closed 4 months ago

isVoid commented 4 months ago

This PR moves shim function generation to lowering time, this limits the shim functions to only those used in kernels. Expected to reduce I/O and RTC time.

This PR also adds a simulated occasion to generate N functions for Numbast to benchmark optimization. The benchmark shows that with ~100 function declarations, this PR reduces the best case kernel launch time time by 40%

Closes #15

Benchmark data bench-a is this branch, bench-m is main.

------------------------------------------------------------------------------- benchmark 'test_rtc[1000]': 2 tests --------------------------------------------------------------------------------
Name (time in ms)                      Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[1000] (0001_bench-a)     389.9006 (1.0)      401.2640 (1.0)      396.4518 (1.0)      5.5378 (1.0)      399.8293 (1.0)      9.8640 (1.98)          1;0  2.5224 (1.0)           5           1
test_rtc[1000] (0002_bench-m)     863.5706 (2.21)     879.0632 (2.19)     867.9739 (2.19)     6.2954 (1.14)     866.1069 (2.17)     4.9712 (1.0)           1;1  1.1521 (0.46)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------- benchmark 'test_rtc[100]': 2 tests --------------------------------------------------------------------------------
Name (time in ms)                     Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[100] (0001_bench-a)      67.4049 (1.0)       93.5881 (1.0)       73.1375 (1.0)      7.8246 (1.96)      68.5189 (1.0)      9.1765 (3.11)          1;1  13.6729 (1.0)          12           1
test_rtc[100] (0002_bench-m)     114.4357 (1.70)     123.5623 (1.32)     116.4549 (1.59)     3.9909 (1.0)      114.5094 (1.67)     2.9486 (1.0)           1;1   8.5870 (0.63)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------ benchmark 'test_rtc[10]': 2 tests ------------------------------------------------------------------------------
Name (time in ms)                   Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[10] (0001_bench-a)     30.6027 (1.0)      45.7817 (1.27)     32.1638 (1.0)      3.3724 (9.30)     30.7253 (1.0)      0.5989 (1.40)          4;5  31.0909 (1.0)          25           1
test_rtc[10] (0002_bench-m)     34.7538 (1.14)     36.1526 (1.0)      35.0487 (1.09)     0.3625 (1.0)      34.8931 (1.14)     0.4279 (1.0)           2;1  28.5317 (0.92)         17           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------ benchmark 'test_rtc[1]': 2 tests ------------------------------------------------------------------------------
Name (time in ms)                  Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rtc[1] (0001_bench-a)     26.9139 (1.0)      27.8245 (1.0)      27.4018 (1.01)     0.3614 (1.0)      27.5244 (1.01)     0.6815 (1.23)          5;0  36.4939 (0.99)         10           1
test_rtc[1] (0002_bench-m)     26.9212 (1.00)     28.0164 (1.01)     27.2604 (1.0)      0.3704 (1.02)     27.1367 (1.0)      0.5529 (1.0)           2;0  36.6833 (1.0)          10           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------