comp-imaging / ProxImaL

A domain-specific language for image optimization.
MIT License
114 stars 29 forks source link

Upgrade to Halide 10 with CUDA autoscheduler #54

Closed antonysigma closed 3 years ago

antonysigma commented 3 years ago

Upgrade to Halide 10 with CUDA autoscheduler

Upgrade the Halide toolchain from 8.0 to 10.0, which in turn unlocks
the following features:

 - (completed) to emit GPU accelerated image processing kernels;
 - (completed) to optimize / fuse CUDA kernels with the Li2018 autoscheduler;
 - (completed) supports either x86_64 or ARM environments
   (use-case of Halide on ARM: Nvidia Jetson SoB);
 - (TBD) to derive and accelerate the analytic first-order partial derivative
    of any proximal functions, aka the Autodiff;
 - (TBD) to derive and accelerate the output-weighted first-order partial
   derivative of any linear operations.

The only catch is that the Halide provided fft routine is not
compatible with Li2018-CUDA autoscheduler. Developers are
encouraged to contribute the cuFFT interfaces to the ProxImaL project.

Likewise, the Affine transform, warp() and warpT(), invokes
the cyclic image boundary conditions; so far the Li2018 autoscheduler
toolchain is unable to optimize such operations in CUDA.
Developers who wish to make use of the feature can contact the
Halide team for help.

Halide: support platforms beyond linux-x86

This change utilizes the python extension module to generate the
correct filename of the Halide-optimized Python bindings.

For example, Python3.6 interpreter on 64-bit Linux OS will look for
the following extension:

proximal/halide/build/prox_L1.cpython-36m-x86_64-linux-gnu.so

when the line `import proximal.halide.build.prox_L1` is encountered.

Similarly, ARM architecture will look for the following pattern

proximal/halide/build/prox_L1.cpython-36m-aarch64-linux-gnu.so

to import the corresponding python module.

---

TODO: To embed the hyperlink of the Halide toolchain for Windows OS

https://github.com/halide/Halide/tree/v10.0.0

in a new file:

proximal/halide/subprojects/halide-windows.warp
antonysigma commented 3 years ago

Hi @SteveDiamond ,

I found a few working code fragments left on my Nvidia Jetson Xavier device, so I was thinking if you would like to merge them as well.

It should also resolve #31.

BTW, is it possible to "rebase and merge" all PRs from now on? It would help me a lot to keep a linear history of the master branch.

Regards, Antony

SteveDiamond commented 3 years ago

Thanks for the PR @antonysigma ! Yeah we can rebase and merge. Can you describe how we're testing this new commit? Ideally the whole codebase, including all the Halide and Cuda stuff should be getting tested in CI.

antonysigma commented 3 years ago

Yes, CI can validate the halide build steps to generate cuda code, by pulling the Nvidia Docker image. Here is one example: https://hub.docker.com/r/nvidia/cuda. It cannot execute cuda though, unless you can rent a physical GPU card on TravisCI out of your pocket.

I can put more thoughts into it over the next 4 weeks; I am not in a hurry to have the PR merged.

By the way, are you aware of the following announcement for open-source projects? How many are your remaining credits of the ProxImaL repo?

https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing

SteveDiamond commented 3 years ago

Thanks for the update about Travis, we'll have to get cvxpy off it as well. Ok great let's take some time to consider all this. Testing is super important.

SteveDiamond commented 3 years ago

Yay, github actions is working now! Thanks @antonysigma ! Should I merge this in?

antonysigma commented 3 years ago

Yay, github actions is working now! Thanks @antonysigma ! Should I merge this in?

Absolutely! I also took the opportunity to trigger GH actions for all types of pull requests .