fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.1k stars 508 forks source link

Deformable DETR is now available in HuggingFace Transformers #160

Open NielsRogge opened 1 year ago

NielsRogge commented 1 year ago

Hi,

Deformable DETR is now available in 🤗 Transformers: https://huggingface.co/docs/transformers/main/en/model_doc/deformable_detr.

All checkpoints are on the hub: https://huggingface.co/models?other=deformable_detr.

The implementation supports both CPU and GPU (and you can choose to use the custom kernel or not when running on GPU). 🥳

Inference

For inference, I refer to the example code snippet in the docs.

Fine-tuning on custom data

For fine-tuning, I refer to this demo notebook, illustrating how to fine-tune the model. Fine-tuning Deformable DETR is equivalent to fine-tuning DETR (just replace DetrForObjectDetection in the notebook by DeformableDetrForObjectDetection).

NielsRogge commented 1 year ago

Potentially relevant to the following issues:

13

33

42

48

56

75

92

88

96

97

107

112

113

117

126

128

129

130

131

bo-ke commented 1 year ago

cool !

jeiva2000 commented 1 year ago

Hi,

would it be possible to obtain a detailed script to train from a pre-trained model? as following the tutorial used in DETR causes problems.

NielsRogge commented 1 year ago

as following the tutorial used in DETR causes problems.

Can you clarify which issues you had?

jeiva2000 commented 1 year ago

I have changed DetrForObjectDetection to DeformableDetrForObjectDetection in the DETR class and I have also changed the DetrFeatureExtractor to AutoFeatureExtractor. Finally I define the batch size as 1 in the dataloaders and the training runs correctly but when evaluating I get the following error:

IndexError: max(): Expected reduction dim 2 to have non-zero size.

Looking at the inference script I see that the AutoFeatureExtractor is used to get the input and pass it to the model, because of this I am not very clear. Prueba_train_deformable_detr.zip

I attach the notebook I am using so that you can see it easier.

GivanTsai commented 1 year ago

Is there any script to draw attention map of deformable detr in powerful HuggingFace?

NielsRogge commented 1 year ago

Is there any script to draw attention map of deformable detr in powerful HuggingFace?

You can just follow this notebook where I show how to visualize attention maps of the decoder. Make sure to replace DetrForObjectDetection by DeformableDetrForObjectDetection.

GivanTsai commented 1 year ago

Is there any script to draw attention map of deformable detr in powerful HuggingFace?

You can just follow this notebook where I show how to visualize attention maps of the decoder. Make sure to replace DetrForObjectDetection by DeformableDetrForObjectDetection.

I think the way to draw attention map of deformble detr is much different from detr. Since it uses reference points and sampling offsets.

andrearosasco commented 1 year ago

Hi, I'm new to hugging face so I might be missing something obvious but when I try to import DeformableDetrForObjectDetection from transformer (I've checked and I have the latest version) I get a an ImportError.

Also the feature extractor doesn't work as feature_extractor_class_from_name('DeformableDetrFeatureExtractor') so that feature_extractor = AutoFeatureExtractor.from_pretrained("SenseTime/deformable-detr") fails with AttributeError: 'NoneType' object has no attribute 'from_dict'

Solved it by installing with pip install -q git+https://github.com/huggingface/transformers.git instead of pypi.

NielsRogge commented 1 year ago

Hi,

Thanks for reporting. We indeed fixed Deformable DETR's feature extractor as seen in #19140. It will be included in the next PyPi release.

andrearosasco commented 1 year ago

Alright, good to hear! By the way kudos for the great work. One last question, do you know if anyone has tried to convert the model to TensorRT? I was able to export an onnx using opset=16 but then TensorRT doesn't have any implementation of GridSample and I could not proceed furhter.

sepidehkhakzad commented 1 year ago

@NielsRogge Hi and thanks for the great work, While implementing the code for finetuning DeformableDETR on my dataset, I realized that len(train_dataset) and len(val_dataset) is smaller than the real number of training and val files. The dataset is fine since I have successfully fine-tuned DETR on it but I'm guessing the issue arises when I use DeformableDetrFeatureExtractor. Would you by any chance know of a reason why this happens?

NielsRogge commented 1 year ago

Hi,

Hmm normally that shouldn't be changed because of the feature extractor. Did you create a regular PyTorch dataset?

sepidehkhakzad commented 1 year ago

Yes, It's basically just a number of images and their annotations. I have tested the annotations file via multiple coco-viewers and also the same thing did not happen while fine-tuning DETR with the same dataset.

ashim-mahara commented 1 year ago

onnx using opset=16

Were you using the transformers library? I am trying to export to onnx but it results in an error. Could you please share your script if you can?

Zalways commented 1 year ago

have you tried to export the model into torchscript format to a C++ environment for inference? i tried,but the exported model doesn't work,so l'd like to find some help from you! I had some problems converting and exporting the model, but I finally got the model through the trace method, but there was a problem with this model. RuntimeError: The size of tensor a (32) must match the size of tensor b (237) at non-singleton dimension 1 i need your help!

andrearosasco commented 1 year ago

Could it be that you traced the model with fixed batch size of 237 and you are trying to run inference with a batch of size 32? @Zalways

Zalways commented 1 year ago

Could it be that you traced the model with fixed batch size of 237 and you are trying to run inference with a batch of size 32? @Zalways

thank u for your reply! it seems not this reason cause the error, i tried input a tensor in specific shape,it still error. i wondering whether the model is exported correctlly,(but i donn't know how to check my exported model is right,) my current exported model just can inference the image i used for export model,for other image or tensor,it shows the error infomation like:The size of tensor a (xxx) must match the size of tensor b (xxx) at non-singleton dimension 0; i'v been confused for a longlong time.

andrearosasco commented 1 year ago

my current exported model just can inference the image i used for export model

are the other images the same shape as the one you used for tracing?

Zalways commented 1 year ago

@andrearosasco yes , when i use other image,image has been preprocessed as the same, and i tried to input the same shape tensor into the model,it doesn't work! it's very strange, now i have no idea about it .i hope someone could help me with this issue

Zalways commented 1 year ago

when i try to export the deformable detr model into torchscript,it shows the error message! Could not export Python function call 'MSDeformAttnFunction'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to constants: /root/autodl-tmp/project/deepsolo/adet/layers/ms_deform_attn.py(165): forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl /root/autodl-tmp/project/deepsolo/adet/layers/deformable_transformer.py(286): forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl /root/autodl-tmp/project/deepsolo/adet/layers/deformable_transformer.py(413): forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl /root/autodl-tmp/project/deepsolo/adet/layers/deformable_transformer.py(173): forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl /root/autodl-tmp/project/deepsolo/adet/modeling/model/detection_transformer.py(200): forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl /root/autodl-tmp/project/deepsolo/adet/modeling/text_spotter_v1.py(222): forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(259): /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(952): trace_module /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(735): trace deploy/export_model00.py(126): export_tracing deploy/export_model00.py(226):

anybody knows how to solve it? i tried the method: image it exported successfully,but the exported model doesn't work! and when i use this exported model to inference,it shows the error message: /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return forward_call(*input, *kwargs) Traceback (most recent call last): File "/root/autodl-tmp/project/deploy/export_model.py", line 264, in out1 = m(data) File "/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/detectron2/export/flatten.py", line 9, in forward def forward(self: torch.detectron2.export.flatten.TracingAdapter, argument_1: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]: _0, _1, _2, _3, _4, _5, _6, = (self.model).forward(argument_1, )


    return (_0, _1, _2, _3, _4, _5, _6)
  File "code/__torch__/adet/modeling/text_spotter.py", line 23, in forward
    batched_imgs = torch.unsqueeze_(_7, 0)
    x0 = torch.contiguous(batched_imgs)
    _8, _9, _10, _11, = (_0).forward(x0, image_size, )
                         ~~~~~~~~~~~ <--- HERE
    _12 = torch.softmax(_9, -1)
    prob = torch.sigmoid(torch.mean(_8, [-2]))
  File "code/__torch__/adet/modeling/model/detection_transformer.py", line 50, in forward
    _29 = getattr(self.input_proj, "1")
    _30 = getattr(self.input_proj, "0")
    _31 = (self.backbone).forward(x, image_size, )
           ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51, _52, _53, _54, _55, _56, _57, = _31
    _58 = (_30).forward(_32, )
  File "code/__torch__/adet/modeling/text_spotter.py", line 104, in forward
    image_size: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
    _61 = getattr(self, "1")
    _62 = (getattr(self, "0")).forward(x, image_size, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _63, _64, _65, _66, _67, _68, _69, = _62
    pos_embed = torch.to((_61).forward(_63, ), 6)
  File "code/__torch__/adet/modeling/text_spotter.py", line 143, in forward
    _92 = torch.slice(torch.slice(_91, 0, 0, 125), 1, 0, 138)
    _93 = torch.view(CONSTANTS.c2, annotate(List[int], []))
    _94 = torch.copy_(_92, torch.expand(_93, [125, 138]))
          ~~~~~~~~~~~ <--- HERE
    masks_per_feature_level0 = torch.ones([_85, _86, _87], dtype=11, layout=None, device=torch.device("cpu"), pin_memory=False)
    _95 = torch.select(masks_per_feature_level0, 0, 0)

Traceback of TorchScript, original code (most recent call last):
/root/autodl-tmp/project/adet/modeling/text_spotter.py(60): mask_out_padding
/root/autodl-tmp/project/adet/modeling/text_spotter.py(43): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(21): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/model/detection_transformer.py(168): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(220): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(259): <lambda>
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(952): trace_module
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(735): trace
/root/autodl-tmp/project/deploy/export_model.py(125): export_tracing
/root/autodl-tmp/project/deploy/export_model.py(224): <module>
/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py(18): execfile
/root/.pycharm_helpers/pydev/pydevd.py(1496): _exec
/root/.pycharm_helpers/pydev/pydevd.py(1489): run
/root/.pycharm_helpers/pydev/pydevd.py(2177): main
/root/.pycharm_helpers/pydev/pydevd.py(2195): <module>
RuntimeError: The size of tensor a (50) must match the size of tensor b (125) at non-singleton dimension 0

so i think the problem maybe occurs in export step :Could not export Python function call 'MSDeformAttnFunction'  

looking forward to your reply!
JannikZgraggenTR commented 3 months ago

@NielsRogge

I've been training the huggingface models for DetrForObjectDetection and DeformableDetrForObjectDetection with PyTorch lightning. I saw that Detr trains like 5times faster (in terms of batch processing) than deformable detr.

Is this expected behaviour?

NielsRogge commented 3 months ago

Hmm pinging @qubvel here. He just added official example scripts for object detection: https://github.com/huggingface/transformers/tree/main/examples/pytorch/object-detection. Works with DETR, Deformable DETR among other models.

qubvel commented 3 months ago

Hi @JannikZgraggenTR Also observed similar behavior, I would say for Deformable Detr it takes 3x more time to process a batch during training. But it converges faster and to better optimum. Both models have trained 100 epochs on cppe-5 dataset, X-axis is the time here.

Screenshot 2024-05-17 at 12 51 43

You can replicate results with examples from HF, but it uses Trainer and Accelerate, not Lightning

NielsRogge commented 3 months ago

@qubvel were you leveraging the custom CUDA kernel for the deformable attention operator?

qubvel commented 3 months ago

Not sure I did it unless it is used by default. Is there any reference on how to enable it?

NielsRogge commented 3 months ago

It looks like the kernels are already enabled by default: https://github.com/huggingface/transformers/blob/3802e786ef64b13bef5e8669dcb96e291d2c5317/src/transformers/models/deformable_detr/configuration_deformable_detr.py#L195

qubvel commented 3 months ago

Just checked trained model config, disable_custom_kernels is false

qubvel commented 3 months ago

In terms of FLOPs Deformable DETR paper reports ~2x compared to DETR, but less training time due to faster convergence

Screenshot 2024-05-17 at 13 11 57
JannikZgraggenTR commented 3 months ago

@qubvel thanks for the great insight, around 2x is now observed by me as well. image version_162 is normal detr (8.8 seconds) version_166 is deform_detr (18.4 seconds) version_165 is deform detr with two stage and box refinement (58.8 seconds)

from the table 1 in conditional DETR paper iterative box refinement should however not add computational cost? (@qubvel)

JannikZgraggenTR commented 3 months ago

@NielsRogge thanks a lot for your response, I have disable_custom_kernels = False for the deformable DETR models. I originally became interested in deformable DETR because I saw that TableTransformerForObjectDetection (DETR) struggles with tables with many small rows, most likely due to how its attention works. What was the motivation of training DETR to PubTables-1M rather than Deformable-DETR? (Wouldn't Deformable-Detr be strictly superior?)