This pull request makes it possible to build and run causal-conv1d package on AMD GPUs.
The pull request includes
adjustments to CUDA kernel coda to make it hipify-friendly (i.e. make it possible for pytorch hipification routine to automatically create .hip source code when the package is built on ROCm). The original CUDA code was kept intact, except for minor changes that do not affect behavior (e.g. adding 'typename' for cases when hip compiler fails to infer them). All ROCm-specific adjustments are wrapped in #ifdef USE_ROCM.
Minor adjustments to python code. Mostly in setup.py. Again, we aimed to make the code behave exactly the same on CUDA machines.
Minor adjustments to the readme file (ROCm 6.0 requires a quick patch for the build to work).
The push does not include:
Machinery to release ROCm-specific package builds (we haven't touched the github workflow file as of yet).
This pull request makes it possible to build and run causal-conv1d package on AMD GPUs.
The pull request includes