TorchScript implementation of Push/Pull + spline coefficients

I have reimplemented all the low-level push/pull utilities in TorchScript. TorchScript is a strongly typed subset of python+pytorch that gets just-in-time compiled. It's main advantage is the ability to fuse sequences of voxel-wise operations into a single cuda kernel.

It means that we now have a version of nitorch that works without needing to compile any C++/CUDA code. It is however quite slower than the C++/CUDA version. Here' s what I get when pulling a [192, 192, 192] image with 1st order splines:

C	TorchScript CPU	CUDA	TorchScript GPU
0.1 s	1s	0.7 ms	5 ms

To install nitorch without compiling the C++/CUDA code, use: NI_COMPILED_BACKEND="TS" python setup.py install|develop By default, NI_COMPILED_BACKEND="C" and the C++/CUDA extensions are compiled. It is also possible to use NI_COMPILED_BACKEND="MONAI"to try using MONAI's version of push/pull, but last time I checked, they were not available in the pip version of MONAI.

When calling nitorch, it first tries to load the C components, then MONAI, then the TorchScript implementation. It means that if you have used the develop mode and have the compiled code lying around, it will be used -- even if you used NI_COMPILED_BACKEND="TS" python setup.py develop afterwards.

To force nitorch to use the TorchScript code, a solution is to set the environment variable NI_COMPILED_BACKEND="TS" before importing nitorch.

I have also ported bsplinc from SPM. It is a prefiltering that returns interpolating spline coefficients. It is needed to perform high order (> 1) resampling. You can use either ni.spatial.spline_coeff of ni.spatial.spline_coeff_nd. It is only implemented for boundary conditions dft and dct1.

Filtering a [192, 192, 192] volume takes about 170 ms on the CPU and 20 ms on the GPU.

@brudfors

balbasty / nitorch

TorchScript implementation of Push/Pull + spline coefficients #56