IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
6.37k stars 660 forks source link

RuntimeError: Unknown layout in MultiScaleDeformableAttnFunction #284

Open soniajoseph opened 7 months ago

soniajoseph commented 7 months ago

Hi, thanks for writing and maintaining this repo.

This is the error

  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 53, in forward
    output = _C.ms_deform_attn_forward(
RuntimeError: Unknown layout

Full error trace

/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
final text_encoder_type: bert-base-uncased
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/transformers/modeling_utils.py:768: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
Traceback (most recent call last):
  File "/home/mila/s/sonia.joseph/patch_level_labels/Grounded-Segment-Anything/test_groundeddino.py", line 20, in <module>
    boxes, logits, phrases = predict(
  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/util/inference.py", line 68, in predict
    outputs = model(image[None], captions=[caption])
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/models/GroundingDINO/groundingdino.py", line 327, in forward
    hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py", line 258, in forward
    memory, memory_text = self.encoder(
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py", line 576, in forward
    output = checkpoint.checkpoint(
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 261, in forward
    outputs = run_function(*args)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py", line 785, in forward
    src2 = self.self_attn(
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 338, in forward
    output = MultiScaleDeformableAttnFunction.apply(
  File "/home/mila/s/sonia.joseph/patch_level_labels/env/lib/python3.9/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/mila/s/sonia.joseph/patch_level_labels/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 53, in forward
    output = _C.ms_deform_attn_forward(
RuntimeError: Unknown layout

Full code snippet


from groundingdino.util.inference import load_model, load_image, predict, annotate
import cv2
import os

CUR_FOLDER = os.path.dirname(os.path.abspath(__file__))

ROOT_FOLDER ='root/path'

GROUNDING_DINO_CONFIG_PATH = os.path.join(CUR_FOLDER,"../GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py")

model = load_model(GROUNDING_DINO_CONFIG_PATH, "/scratch/path/groundingdino_swint_ogc.pth")
IMAGE_PATH = "../test_images/hikers.png"
TEXT_PROMPT = "chair . person . dog ."
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

image_source, image = load_image(IMAGE_PATH)

boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=TEXT_PROMPT,
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)

annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
cv2.imwrite("annotated_image.jpg", annotated_frame)

Thank you!

rentainhe commented 7 months ago

Would you like to share more about your environment info with us for checking this issue

bbikdash commented 7 months ago

I had this issue as well. OS: Ubuntu 20.04 Python: 3.10.13 CUDA Version: 12.1

This error happened when I attempted to specify nvidia GPUs as the inference device. The inference code ran on the CPU.

arindam-mazumdar commented 7 months ago

I have the same issue. OS: Ubuntu 18.04 Python: 3.8.3 CUDA: 11.8 After successfully installing everything, I ran the test notebook only.

wenjie710 commented 6 months ago

The issue is similar to issue. Degrading torch to 2.1.0 works for me.