eladb3 / ORViT

"Object-Region Video Transformers”, Herzig et al., CVPR 2022
Apache License 2.0
42 stars 12 forks source link

Problem about the bad baseline result #6

Closed huangp0310 closed 2 years ago

huangp0310 commented 2 years ago

Hello, we run the baseline model on Something-else dataset under the ORViT framework. But the best acc we get is only 49.6, which is far from the result (60.2) reported in the paper. There are no changes to the model's config file (Smthelse_ORViT-MF_224_16x4.yaml), except for the batch_size being adjusted to 16 (4 GPUs).

eladb3 commented 2 years ago

Hi, this result is produced simply by training MF on the compositional split, the Smthelse_ORViT-MF_224_16x4.yaml config is for training MF~ORViT.

yeyingdege commented 1 year ago

Hi, this result is produced simply by training MF on the compositional split, the Smthelse_ORViT-MF_224_16x4.yaml config is for training MF~ORViT.

Hello, I really appreciate your work. Could you please explain more about this issue? @eladb3