filtered_lrelu called with parameters that have no optimized CUDA kernel, using generic fallback

Describe the bug A clear and concise description of what the bug is.

Output of _plugin.filtered_lrelu(x, fu, fd, b, si, up, down, px0, px1, py0, py1, sx, sy, gain, slope, clamp, flip_filter, write_signs) returns the return code -1, therefore it uses generic fallback. I want to use optimized kernels and not generic fallback. What might be the issue here for getting -1 return code?

I believe code enters

    if (!test_spec.exec)
    {
        // No kernel found - return empty tensors and indicate missing kernel with return code of -1.
        return std::make_tuple(torch::Tensor(), torch::Tensor(), -1);
    }

But I am not sure why this happens.

Desktop

OS: Ubuntu 16.04,
PyTorch version 1.9.1
CUDA toolkit version: CUDA 11.1
NVIDIA driver version: 510.47.03
GPU: Tesla T4
Docker: No

NVlabs / stylegan3

filtered_lrelu called with parameters that have no optimized CUDA kernel, using generic fallback #622