MasterBin-IIAU / Unicorn

[ECCV'22 Oral] Towards Grand Unification of Object Tracking
MIT License
953 stars 87 forks source link

为什么没有尝试一下使用RepLKNet作为backbone呢? #2

Closed LYMDLUT closed 2 years ago

LYMDLUT commented 2 years ago

ConvNext似乎在下游任务上表现不是非常优,您能做到非常优秀的结果,是否有什么好的经验呢? 二是您有没有尝试过RepLKNet呢?这里面的选择有什么考量吗?如果做过的话,也希望如果方便的话,最好也能提供一下实验结果。 最后关注到您是使用16a100完成的,请问如果显存只有11g的卡8能否容得下呢?

再次感谢您大统一的优秀工作!冒昧再次打扰,感谢!

ConvNext does not seem to be doing very well on downstream tasks. Do you have any good experience in achieving very good results? Second, have you tried RepLKNet? Is there any consideration in the selection? If so, I also hope to provide the experimental results if it is convenient. Finally, I noticed that you used 16 A100 to complete the task. Can the card 8 with only 11G memory hold the task?

Thank you again for your great work! Thank you for bothering me again!

MasterBin-IIAU commented 2 years ago

@LYMDLUT Hi, from our experience, ConvNeXt performs slightly better than Swin-Transformer in various object tracking tasks. I think an important hyper-parameter for ConvneXt is the drop-path rate. You can refer to the convnext-object-detection for more insights about this hyper-parameter.

We haven't tried RepLKNet because this work was released after the ECCV submission deadline.

I think that you can first try using the setting of Unicorn-RT, which adopts convnext-tiny backbone and a lower input resolution of 640x1024. This should be the most memory-efficient variant of the unicorn. If there is still out-of-memory problem, you can consider reducing the batchsize from 2 to 1.

LYMDLUT commented 2 years ago

以后会尝试RepLKNet吗,比较想知道这两个在跟踪上的效果比较,感谢

MasterBin-IIAU commented 2 years ago

@LYMDLUT Hi, for now, we do not have plans to try RepLKNet. From the reported results in the original papers, the performance of RepLKNet on object detection is lower than that of ConvNext. So I think that results on object tracking may be the same.

截屏2022-07-16 下午10 01 10 截屏2022-07-16 下午10 06 52
LYMDLUT commented 2 years ago

真的非常感谢您耐心详尽地回复! backbone的选择其实与本文优秀的贡献其实并不相关,咨询这个问题也是想验证一下跟踪器的backbone采用超大卷积核,带来更大感受野后,对性能是否能有更大提升。 本以为今年会继续从检测那边的pix2seq延伸过来,结果是更大的惊喜,非常祝贺,真心希望未来能实现整个cv领域大统一。 最后问一下您是否有个人的blog或者知乎的文章分享呢?也想多了解了解该领域的工作,冒昧打扰,非常感谢。

MasterBin-IIAU commented 2 years ago

@LYMDLUT Hi, in fact, All three works (Swin, ConvNext, RepLKNet) have a large receptive field, which is beneficial to downstream tasks like object detection, semantic segmentation, and object tracking.

Thanks :) We also think that unification is the future trend of computer vision. Recently, there are also many related works.

Here is a Zhihu article written by our team.