fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.15k stars 513 forks source link

Problems with DeformableDetrTransformerDecoder #158

Open wyz-gitt opened 2 years ago

wyz-gitt commented 2 years ago

Why are the reference_points dimensions of the encoder and decoder in DeformableDetrTransformer different? The reference_points in the encoder are four-dimensional. The reference_points in the decoder are three-dimensional, but both the encoder and the decoder call Multi Scale Attention. The reference_points in the Multi Scale Attention are required to be four-dimensional, so the Multi Scale Attention of the decoder reports an error. Have you encountered this situation?

The error is as follows: sampling_locations = reference_points[:, :, None, :, None, :] IndexError: too many indices for tensor of dimension 3 :::That is to say, the reference_points in the decoder are three-dimensional and cannot run the following code (reference_points should be four-dimensional)