I have reimplemented all the low-level push/pull utilities in TorchScript. TorchScript is a strongly typed subset of python+pytorch that gets just-in-time compiled. It's main advantage is the ability to fuse sequences of voxel-wise operations into a single cuda kernel.
It means that we now have a version of nitorch that works without needing to compile any C++/CUDA code. It is however quite slower than the C++/CUDA version. Here' s what I get when pulling a [192, 192, 192] image with 1st order splines:
C
TorchScript CPU
CUDA
TorchScript GPU
0.1 s
1s
0.7 ms
5 ms
To install nitorch without compiling the C++/CUDA code, use:
NI_COMPILED_BACKEND="TS" python setup.py install|develop
By default, NI_COMPILED_BACKEND="C" and the C++/CUDA extensions are compiled.
It is also possible to use NI_COMPILED_BACKEND="MONAI"to try using MONAI's version of push/pull, but last time I checked, they were not available in the pip version of MONAI.
When calling nitorch, it first tries to load the C components, then MONAI, then the TorchScript implementation. It means that if you have used the develop mode and have the compiled code lying around, it will be used -- even if you used NI_COMPILED_BACKEND="TS" python setup.py develop afterwards.
To force nitorch to use the TorchScript code, a solution is to set the environment variable NI_COMPILED_BACKEND="TS" before importing nitorch.
I have also ported bsplinc from SPM. It is a prefiltering that returns interpolating spline coefficients. It is needed to perform high order (> 1) resampling. You can use either ni.spatial.spline_coeff of ni.spatial.spline_coeff_nd. It is only implemented for boundary conditions dft and dct1.
Filtering a [192, 192, 192] volume takes about 170 ms on the CPU and 20 ms on the GPU.
I have reimplemented all the low-level push/pull utilities in TorchScript. TorchScript is a strongly typed subset of python+pytorch that gets just-in-time compiled. It's main advantage is the ability to fuse sequences of voxel-wise operations into a single cuda kernel.
It means that we now have a version of nitorch that works without needing to compile any C++/CUDA code. It is however quite slower than the C++/CUDA version. Here' s what I get when pulling a [192, 192, 192] image with 1st order splines:
To install nitorch without compiling the C++/CUDA code, use:
NI_COMPILED_BACKEND="TS" python setup.py install|develop
By default,NI_COMPILED_BACKEND="C"
and the C++/CUDA extensions are compiled. It is also possible to useNI_COMPILED_BACKEND="MONAI"
to try using MONAI's version of push/pull, but last time I checked, they were not available in the pip version of MONAI.When calling nitorch, it first tries to load the C components, then MONAI, then the TorchScript implementation. It means that if you have used the
develop
mode and have the compiled code lying around, it will be used -- even if you usedNI_COMPILED_BACKEND="TS" python setup.py develop
afterwards.To force nitorch to use the TorchScript code, a solution is to set the environment variable
NI_COMPILED_BACKEND="TS"
before importing nitorch.I have also ported
bsplinc
from SPM. It is a prefiltering that returns interpolating spline coefficients. It is needed to perform high order (> 1) resampling. You can use eitherni.spatial.spline_coeff
ofni.spatial.spline_coeff_nd
. It is only implemented for boundary conditions dft and dct1.Filtering a [192, 192, 192] volume takes about 170 ms on the CPU and 20 ms on the GPU.
@brudfors