IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
14.11k stars 1.31k forks source link

on Linux:NameError: name '_C' is not defined #469

Open artificialzjy opened 3 months ago

artificialzjy commented 3 months ago

I followed the instruction of the readme,but when i try the first demo,i got the error: Failed to load custom C++ ops. Running on CPU mode Only! and NameError: name '_C' is not defined.

I install the torch,torchaudio,torchvision with pip.

advait-patel-17 commented 3 months ago

getting a similar error, I believe it has to do with the cuda version, but not sure how to solve it. Running CUDA 12-2

oymzysmwe224 commented 3 months ago

I upgraded the torch version to 2.2.1, which resolved the issue. I hope this solution is beneficial to you all.

Saadalh commented 1 month ago

I am getting the same error. I have installed using docker within an Ubuntu 20.04 WSL2 distribution.

nvcc --version returns release 11.6 (CUDA version)

Saadalh commented 1 month ago

This error is resulting due to a problem loading the custom C++ operations required by the GroundingDINO model. The warning message "Failed to load custom C++ ops. Running on CPU mode Only!" suggests that the necessary compiled C++ operations were not found or could not be loaded.

There are some other requirements and modules needed by GroundingDINO. These can be installed using the requirements.txt and setup.py files inside of the GroundingDINO directory. Navigate to it and run:

pip install -r requirements.txt and python setup.py install

All necessary prerequisites should be now installed, navigate back to the parents directory and try running the demo again.

nourihilscher commented 1 month ago

As a quick fix (or another solution that goes into a different direction) that I had to come up with because I had to make it work in an environment where I could not install CUDA tools, you can change the GroundingDINO ms_deform_attention.py code to use multi_scale_deformable_attn_pytorch.

The Grounding DINO code should work fine after that, but I have to admit that I did not test it thoroughly and that I am not sure whether the PyTorch implementation has any disadvantages over their CUDA build implementation in terms of runtime on a GPU. To be more precise:

  1. In Grounded-Segment-Anything/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py, delete the code on lines 28 to 30, i.e.,
    try:
    from groundingdino import _C
    except:
    warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")
  2. Delete the MultiScaleDeformableAttnFunction implementation on lines 41 to 90 i.e.

    lass MultiScaleDeformableAttnFunction(Function):
    @staticmethod
    def forward(
        ctx,
        value,
        value_spatial_shapes,
        value_level_start_index,
        sampling_locations,
        attention_weights,
        im2col_step,
    ):
        ctx.im2col_step = im2col_step
        output = _C.ms_deform_attn_forward(
            value,
            value_spatial_shapes,
            value_level_start_index,
            sampling_locations,
            attention_weights,
            ctx.im2col_step,
        )
        ctx.save_for_backward(
            value,
            value_spatial_shapes,
            value_level_start_index,
            sampling_locations,
            attention_weights,
        )
        return output
    
    @staticmethod
    @once_differentiable
    def backward(ctx, grad_output):
        (
            value,
            value_spatial_shapes,
            value_level_start_index,
            sampling_locations,
            attention_weights,
        ) = ctx.saved_tensors
        grad_value, grad_sampling_loc, grad_attn_weight = _C.ms_deform_attn_backward(
            value,
            value_spatial_shapes,
            value_level_start_index,
            sampling_locations,
            attention_weights,
            grad_output,
            ctx.im2col_step,
        )
    
        return grad_value, None, None, grad_sampling_loc, grad_attn_weight, None
  3. Change the forward function of the MultiScaleDeformableAttention Module to use the pytorch implementation i.e. change on line 329 to 352 from

    if torch.cuda.is_available() and value.is_cuda:
            halffloat = False
            if value.dtype == torch.float16:
                halffloat = True
                value = value.float()
                sampling_locations = sampling_locations.float()
                attention_weights = attention_weights.float()
    
            output = MultiScaleDeformableAttnFunction.apply(
                value,
                spatial_shapes,
                level_start_index,
                sampling_locations,
                attention_weights,
                self.im2col_step,
            )
    
            if halffloat:
                output = output.half()
        else:
            output = multi_scale_deformable_attn_pytorch(
                value, spatial_shapes, sampling_locations, attention_weights
            )

    to

    output = multi_scale_deformable_attn_pytorch(
            value, spatial_shapes, sampling_locations, attention_weights
        )

    Note that you probably need to run pip uninstall and reinstall the library after that.