OpenGVLab / InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
https://arxiv.org/abs/2211.05778
MIT License
2.47k stars 231 forks source link

[DCNv4 ERROR] cuda op error when use internimage-L and internimage-LX with DCNv4, however internimage-B works well with DCNv4 #280

Open dou3516 opened 7 months ago

dou3516 commented 7 months ago

cuda op error when use internimage-L and internimage-LX with DCNv4, however internimage-B works well with DCNv4. What is wrong?

Environments: DCNv4: build from https://github.com/OpenGVLab/DCNv4/tree/main/DCNv4_op/make.sh DCNv3: build from https://github.com/OpenGVLab/InternImage/tree/master/segmentation/ops_dcnv3/make.sh

internimage-L config:

    backbone=dict(
        _delete_=True,
        type='InternImage',
        core_op='DCNv3',
        channels=160,
        depths=[5, 5, 22, 5],
        groups=[10, 20, 40, 80],
        mlp_ratio=4.,
        drop_path_rate=0.5, 
        norm_layer='LN',
        layer_scale=1.0,
        offset_scale=2.0,
        post_norm=True,
        with_cp=False,
        out_indices=(0, 1, 2, 3),
        dcn_output_bias=True,  # dcnv4
        mlp_fc2_bias=True,  # dcnv4
        dw_kernel_size=3,  # dcnv4
        use_dcn_v4_op=use_dcn_v4_op,  # dcnv4
        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),

error log:

error in dcnv4_im2col_cuda: invalid configuration argument
launch arguments: gridDim=(1568, 1, 1), blockDim=(16, 80, 1), shm_size=5760
...
...
  File "/home/miniconda3/envs/dcnv4/lib/python3.9/site-packages/DCNv4-1.0.0.post2-py3.9-linux-x86_64.egg/DCNv4/functions/dcnv4_func.py", line 125, in backward
    ext.dcnv4_backward(*args)
RuntimeError: falseINTERNAL ASSERT FAILED at "/home/dbc/AIcode/DL/SS/mmsegmentation-dev1.x/DCNv4_op/src/cuda/dcnv4_col2im_cuda.cuh":470, please report a bug to PyTorch. kernel launch error
zhiqi-li commented 5 months ago

Hi, what the shape of your input tensor? Since DCNv4 utilizes share memory to store tensors, tensors with extremely large shape will cause errors.

dou3516 commented 5 months ago

Hi, what the shape of your input tensor? Since DCNv4 utilizes share memory to store tensors, tensors with extremely large shape will cause errors.

B x C x H x W = 8 x 3 x 448 x 448

yan-hao-tian commented 1 week ago

Hello, I have the same question, Have you solved it?