In the pure Halide/C++ codegen module, when the Halide-accelerated lin_op and prox_fn are computed in GPUs, mark the input data as "dirty" pending data transfer from host. Explicitly copy the output data back to host at the end of the (L-)ADMM iterations. Verify it with the total-variation denoising example.
In the pure Halide/C++ codegen module, when the Halide-accelerated
lin_op
andprox_fn
are computed in GPUs, mark the input data as "dirty" pending data transfer from host. Explicitly copy the output data back to host at the end of the (L-)ADMM iterations. Verify it with the total-variation denoising example.