Closed enricozb closed 4 months ago
Perf run for 6fc6cd9
:
compiled
========
file runtime main (local)
==============================================================
sort_bitonic c 3.47s 5.40s
cuda 0.23s 0.24s
--------------------------------------------------------------
sum_rec c 1.46s 1.44s
cuda 0.15s 0.13s
--------------------------------------------------------------
sum_tree c 0.13s 0.12s
cuda 0.10s 0.10s
--------------------------------------------------------------
tuples c 3.99s 3.32s
cuda timeout timeout
--------------------------------------------------------------
interpreted
===========
file runtime main (local)
==============================================================
sort_bitonic c 3.54s 3.54s
cuda 0.25s 0.24s
rust timeout timeout
--------------------------------------------------------------
sum_rec c 2.50s 3.54s
cuda 0.15s 0.14s
rust 13.96s 13.51s
--------------------------------------------------------------
sum_tree c 0.19s 0.43s
cuda 0.09s 0.09s
rust 0.88s 0.88s
--------------------------------------------------------------
tuples c 5.41s 3.63s
cuda timeout timeout
rust 3.79s 3.79s
--------------------------------------------------------------
Perf run for 05f1cc7
:
compiled
========
file runtime main (local)
==============================================================
sort_bitonic c 3.70s 4.21s
cuda 0.24s 0.24s
--------------------------------------------------------------
sum_rec c 1.38s 1.44s
cuda 0.15s 0.15s
--------------------------------------------------------------
sum_tree c 0.12s 0.12s
cuda 0.09s 0.09s
--------------------------------------------------------------
tuples c 2.88s 4.01s
cuda timeout timeout
--------------------------------------------------------------
interpreted
===========
file runtime main (local)
==============================================================
sort_bitonic c 4.08s 5.33s
cuda 0.24s 0.23s
rust timeout timeout
--------------------------------------------------------------
sum_rec c 1.73s 1.72s
cuda 0.14s 0.13s
rust 13.51s 13.62s
--------------------------------------------------------------
sum_tree c 0.31s 0.20s
cuda 0.09s 0.09s
rust 0.88s 0.88s
--------------------------------------------------------------
tuples c 3.53s 2.09s
cuda timeout timeout
rust 3.79s 3.81s
--------------------------------------------------------------
Perf run for 56a1dcb
:
compiled
========
file runtime main (local)
==============================================================
sort_bitonic c 3.24s 3.70s
cuda 0.24s 0.24s
--------------------------------------------------------------
sum_rec c 1.42s 1.38s
cuda 0.14s 0.14s
--------------------------------------------------------------
sum_tree c 0.11s 0.12s
cuda 0.10s 0.10s
--------------------------------------------------------------
tuples c 2.95s 2.90s
cuda timeout timeout
--------------------------------------------------------------
interpreted
===========
file runtime main (local)
==============================================================
sort_bitonic c 5.74s 3.47s
cuda 0.24s 0.24s
rust timeout timeout
--------------------------------------------------------------
sum_rec c 1.68s 1.76s
cuda 0.14s 0.13s
rust 13.34s 13.57s
--------------------------------------------------------------
sum_tree c 0.36s 0.34s
cuda 0.09s 0.09s
rust 0.87s 0.88s
--------------------------------------------------------------
tuples c 4.99s 5.37s
cuda timeout timeout
rust 3.79s 3.80s
--------------------------------------------------------------
Perf run for 0fc5635
:
compiled
========
file runtime main (local)
==============================================================
sort_bitonic c 5.53s 4.28s
cuda 0.24s 0.23s
--------------------------------------------------------------
sum_rec c 1.42s 1.42s
cuda 0.14s 0.14s
--------------------------------------------------------------
sum_tree c 0.12s 0.13s
cuda 0.11s 0.10s
--------------------------------------------------------------
tuples c 3.72s 4.16s
cuda timeout timeout
--------------------------------------------------------------
interpreted
===========
file runtime main (local)
==============================================================
sort_bitonic c 6.48s 4.42s
cuda 0.24s 0.24s
rust timeout timeout
--------------------------------------------------------------
sum_rec c 1.83s 2.03s
cuda 0.14s 0.13s
rust 13.69s 14.10s
--------------------------------------------------------------
sum_tree c 0.25s 0.17s
cuda 0.08s 0.08s
rust 0.83s 0.84s
--------------------------------------------------------------
tuples c 2.52s 2.51s
cuda timeout timeout
rust 3.76s 3.82s
--------------------------------------------------------------
Overview
Adds
DL_OPEN
,DL_CALL
, andDL_CLOSE
IO functions.C Runtime
Example usage looks like this: A user first defines some C functions they want to invoke through HVM at runtime:
Functions must have the signature
Port (my_func)(Net*, Book*, Port)
.This file must be compiled as a shared library (
.so
). For example,gcc -shared my-funcs.c -o my-funcs.so
. The file can then be loaded and symbols can be accessed. In Bend this looks like:C Compiled Mode
When compiling a generated HVM C file, you must use the
-rdynamic
flag to enable the shared library to access symbols from the main binary. For example,CUDA Runtime
The FFI is a little different, the above C file would look like this instead:
And functions must have the signature
Port (my_func)(GNet*, Port)
.CUDA Compiled Mode
When compiling a generated HVM C file, you must use the
-rdynamic
flag to the host compiler to enable the shared library to access symbols from the main binary. For example,HVM FFI API
Not everything is exposed to users at the moment, we expose
readback_str
,readback_tup
,readback_bytes
, andinject_bytes
See
hvm.h
for users of the C runtime. Seehvm.cuh
for users of the CUDA runtime.