(Adding the task dependencies for my own reminder.)
[x] Wait for the Halide 16.0 release.
[x] Refactor the Halide::BoundaryConditions calls to use the new APIs;
[x] Similarly, refactor Generator::* related code to use Halide 16.0 APIs;
[x] In algorithms/ladmm.py, ensure all Numpy matrices are Fortran order by default; this avoids the frequent C-order to F-order typecasting overhead in the (L-)ADMM iterations;
[x] Similarly, ensure Halide-accelerated linear operators, e.g. A_mask.cpython.so writes to the output buffers in F-order, not some orphan buffers that are immediately destroyed. This should solve the convergence failure bugs whenever implem='Halide' is defined.
(Adding the task dependencies for my own reminder.)
Halide::BoundaryConditions
calls to use the new APIs;Generator::*
related code to use Halide 16.0 APIs;algorithms/ladmm.py
, ensure all Numpy matrices are Fortran order by default; this avoids the frequent C-order to F-order typecasting overhead in the (L-)ADMM iterations;A_mask.cpython.so
writes to the output buffers in F-order, not some orphan buffers that are immediately destroyed. This should solve the convergence failure bugs wheneverimplem='Halide'
is defined.C++20
; this should cut the compile time in half thanks to newC++ Concepts
feature;ladmm-iter-gen.cpp
with the broadcast operatorHalide::_
.Li2018
autoscheduler withAnderson2021
: the latter utilizes the GPU cache and shared memory in the SM far better.References: https://github.com/halide/Halide/pull/6856 https://github.com/halide/Halide/issues/7459