Assertion `(B*Q) % block_multiplier == 0' failed.

JC8549 commented 10 months ago

Dear author: Hello, I have encountered the following question. May I ask where the error may be? I have been investigating for a long time but have not resolved it. Error info: python: /tmp/pip-install-g285lflt/dcnv4_0c9a40fbaa094f858763d45f8220c7e2/src/cuda/dcnv4_im2col_cuda.cuh:301: void _dcnv4_im2col_cuda(cudaStream_t, const scalar_t, const scalar_t, scalar_t, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, at::opmath_type, int, int, int, int) [with scalar_t = float; stride_type = ulonglong4; int d_stride = 8; cudaStream_t = CUstream_st; at::opmath_type = float]: Assertion `(B*Q) % block_multiplier == 0' failed.

zhiqi-li commented 10 months ago

Can you provide more details? Under what circumstances did you encounter this error?

HarryBarry123 commented 9 months ago

I met the same question.

WuJuli commented 8 months ago

I think it is because you have to set the kernel size to 3

yanhaoerer commented 6 months ago

Have you solved this problem? I also encountered the same problem.

yanhaoerer commented 6 months ago

I met the same question.

Have you solved this problem? I also encountered the same problem.

JulioZhao97 commented 3 months ago

Same here

JulioZhao97 commented 3 months ago

Can you provide more details? Under what circumstances did you encounter this error?

I think it is because you have to set the kernel size to 3

Set kernel_size to 1 gives this error:

My code is follows:

class DilatedBlock(nn.Module):
    """Standard bottleneck with dilated convolution."""

    def __init__(self, c, dilation, k, fuse="sum"):
        """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
        expansion.
        """
        super().__init__()
        self.k = k

        # differnet conv type
        self.dcv = DCNv4(channels=c, kernel_size=k, stride=1, group=1, padding=autopad(k,None,dilation), dilation=dilation)

    def forward(self, x):
        """'forward()' applies the YOLO FPN to input data."""
        # !HACK: not implemented on CPU 
        if x.device == torch.device("cpu"):
            return x
        B, C, H, W = x.shape
        _input = x.view(B,C,H*W).permute(0,2,1).contiguous()
        dx = self.dcv(_input, shape=(H, W))
        dx = dx.permute(0,2,1).contiguous().view(B, C, H, W)
        return dx

shape of x is torch.Size([1, 96, 200, 200]) and _input shape is torch.Size([1, 40000, 96])

jiangruocheng commented 2 months ago

我们也遇到了类似的问题： img width:5472,hight:3648 demo: ~/mmdeploy/csrc/mmdeploy/backend_ops/torchscript/ops/modulated_deform_conv_v4/dcnv4_im2col_cuda.cuh:301: void _dcnv4_im2col_cuda(cudaStream_t, const scalar_t, const scalar_t, scalar_t, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, at::opmath_type, int, int, int, int) [with scalar_t = float; stride_type = ulonglong4; int d_stride = 8; cudaStream_t = CUstream_st; at::opmath_type = float]: Assertion `(B*Q) % block_multiplier == 0' failed. Aborted (core dumped)

OpenGVLab / DCNv4

Assertion `(B*Q) % block_multiplier == 0' failed. #5