ChezJrk / Teg

A differentiable programming language with an integration primitive that soundly handles interactions among the derivative, integral, and discontinuities.
36 stars 4 forks source link

CUDA backend? #23

Open yyuting opened 2 years ago

yyuting commented 2 years ago

Hi, Thanks for open sourcing! I wonder is there an easy way to compile the program to CUDA kernels? It seems CUDA is not one of the backends supported by evaluate(), but the triangulation and perlin_noise applications are evaluated in CUDA. Thanks a lot!

martinjm97 commented 2 years ago

Hi @yyuting,

Thanks for taking a look at the repo, I believe the C code that we generate is CUDA C. You can call it with:

evaluate(expr, backend='C')

@saipraveenb25 should know more about this.

Best, Jesse

yyuting commented 2 years ago

hi, Jesse, Thanks a lot for your answer! I'm following tests/rasterize.py and the C backend is much slower than numpy on my machine (C: 90s; numpy: 5s). After looking at the code I realized that example is creating a separate integral for every pixel and evaluating them sequentially, so it makes sense that the pipeline is not efficient for CUDA. I wonder is it possible to evaluate one integral at a grid of pixel locations in parallel? If so, could you please point me to those examples? Thank you. Best. Yuting

saipraveenb25 commented 2 years ago

Hi @yyuting,

There's actually two C-based backends for eval(): C_PyBind, C

C_PyBind generates C code for the expression, compiles it into a python module the first time you eval the expression and imports it back into python using pybind. This process is fairly expensive the first time, but if you want to run them for a large number of inputs it's fast because they're only compiled once. C generates C code and compiles as a binary (only once). It then uses the command line to execute the binary for different inputs. (without recompiling) Note that this only works if you don't re-create the expression. Build an expression (once) with variables Var() for parameters you want to vary and use the bindings argument on eval() to assign values to variables. See tests/plot.py instead for a more up-to-date rasterization example that uses this idea.

We don't exactly have a CUDA backend (i.e eval() does not use CUDA). However you can generate a CUDA C header file that can be called from your own CUDA kernel using python3 -m teg --compile -t CUDA_C <arguments>. The teg_applications repository has examples that generate and use CUDA code (specifically the graphics/triangulation and graphics/perlin_noise examples). This makefile contains the shell commands that auto-generate these headers as a part of the build process.