Why are the reference_points dimensions of the encoder and decoder in DeformableDetrTransformer different? The reference_points in the encoder are four-dimensional. The reference_points in the decoder are three-dimensional, but both the encoder and the decoder call Multi Scale Attention. The reference_points in the Multi Scale Attention are required to be four-dimensional, so the Multi Scale Attention of the decoder reports an error. Have you encountered this situation?
The error is as follows:
sampling_locations = reference_points[:, :, None, :, None, :]
IndexError: too many indices for tensor of dimension 3
:::That is to say, the reference_points in the decoder are three-dimensional and cannot run the following code (reference_points should be four-dimensional)
Why are the reference_points dimensions of the encoder and decoder in DeformableDetrTransformer different? The reference_points in the encoder are four-dimensional. The reference_points in the decoder are three-dimensional, but both the encoder and the decoder call Multi Scale Attention. The reference_points in the Multi Scale Attention are required to be four-dimensional, so the Multi Scale Attention of the decoder reports an error. Have you encountered this situation?
The error is as follows: sampling_locations = reference_points[:, :, None, :, None, :] IndexError: too many indices for tensor of dimension 3 :::That is to say, the reference_points in the decoder are three-dimensional and cannot run the following code (reference_points should be four-dimensional)