FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
https://glee-vision.github.io/
MIT License
1.02k stars 82 forks source link

关于视频任务模型Plus版本 #8

Closed a773783082 closed 1 month ago

a773783082 commented 7 months ago

github界面只给了图片任务的R50和SwinL2个版本的模型,然后我在huggingface上demo的files里面看到了视频任务的R50版本(visual prompt,GLEE_vos_r50.pth),想问下作者能不能开源一下视频任务的SwinL版本,是不是因为huggingface上使用的GPU跑不动所以才没放SwinL版本? 此外,关于使用的体验,我发现模型对于没学过的语言提示词效果很差,比如用custom-list不认识人头(head),输入human head才有可能给出比较差的结果。

wjf5203 commented 5 months ago

Thank you for your attention! Regarding the VOS version, we have only trained the weight of R50. We will update the weight of SwinL later or you can use the script that will be updated later to train by yourself. Empirically, increasing the background on VOS will not have big improvement (probably within two points). In addition, GLEE has relatively little contact with part-level data during training. We have released three versions of pretrain, joint-training, and scaleup models that may have different generalization properties.