Problems with DeformableDetrTransformerDecoder

Why are the reference_points dimensions of the encoder and decoder in DeformableDetrTransformer different? The reference_points in the encoder are four-dimensional. The reference_points in the decoder are three-dimensional, but both the encoder and the decoder call Multi Scale Attention. The reference_points in the Multi Scale Attention are required to be four-dimensional, so the Multi Scale Attention of the decoder reports an error. Have you encountered this situation?

The error is as follows： sampling_locations = reference_points[:, :, None, :, None, :] IndexError: too many indices for tensor of dimension 3 ：：：That is to say, the reference_points in the decoder are three-dimensional and cannot run the following code (reference_points should be four-dimensional)

fundamentalvision / Deformable-DETR

Problems with DeformableDetrTransformerDecoder #158