SJTU-LuHe / TransVOD

The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
Apache License 2.0
203 stars 28 forks source link

Is there a bug when num_feature_levels = 4? #30

Open tobymu opened 1 year ago

tobymu commented 1 year ago

I can run the code when num_feature_levels = 1.

When num_feature_levels = 4, here is the error (ref_frame_num = 10):

File "deformable_transformer_multi.py", line 231, in forward ref_spatial_shapes = spatial_shapes.expand(BS,self.num_ref_frames, 2).contiguous() RuntimeError: The expanded size of the tensor (10) must match the existing size (4) at non-singleton dimension 1. Target sizes: [1, 10, 2]. Tensor sizes: [4, 2]

tobymu commented 1 year ago

@SJTU-LuHe

WEIZHIHONG720 commented 11 months ago

@SJTU-LuHe

WEIZHIHONG720 commented 11 months ago

@tobymu Did you solve the problem?

Cuviews commented 7 months ago

@tobymu Did you solve the problem? I really want to figure it out...