SJTU-LuHe / TransVOD

The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
Apache License 2.0
203 stars 28 forks source link

Here is a bug? #26

Open August-en opened 1 year ago

August-en commented 1 year ago

https://github.com/SJTU-LuHe/TransVOD/blob/ef864f81036562799ad9c29440200d9b70165a90/models/deformable_transformer_multi.py#L226-L229

August-en commented 1 year ago

This will be all right?

ref_pos_embed_list = torch.chunk(lvl_pos_embed_flatten, self.num_ref_frames+1, dim=0)
cur_pos_embed = lvl_pos_embed_flatten[0]
ref_pos_embed = torch.cat(ref_pos_embed_list[1:], 1)
ref_memory = ref_memory + ref_pos_embed
akanuasiegbu commented 1 year ago

I get this error as well when the batch size is greater than one. When batch size is one this error does not appear. The code on the repo does not appear to work for an arbitrary batch size currently

Zagreus98 commented 1 year ago

Try with this PR #13 for batchsize > 1. I tested it and it works, though would not recommend if you do not have gpus with > 32GB memory.

August-en commented 1 year ago

Try with this PR #13 for batchsize > 1. I tested it and it works, though would not recommend if you do not have gpus with > 32GB memory.

Thank you so much. I will try it as soon as possible.

Have you used the TDTE module in your expriments? I found that the default setting about TDTD in this repo is False. (https://github.com/SJTU-LuHe/TransVOD/issues/27#issue-1451857301). Does it make a big difference whether to use it or not?

Thank you again if you can share your experience :)

Zagreus98 commented 1 year ago

Yes, I did experiments with and without TDTE on my own dataset and the performance was almost the same. What worked in my case to increase performance was to add Illumination variation augmentation and class weights in the loss. I also reproduced the results on ImageNet VID from this page with TDTD set to False, therefore I don't know if it worth adding it or not, it may depend on your dataset.