RuntimeError: The size of tensor a (200) must match the size of tensor b (512) at non-singleton dimension 1

NilBiescas commented 3 weeks ago

Getting this error when running visualizing_annotations.ipynb

I have checked and in defualt.py it is defined the embedded dim to 200: _C.MODEL.ROI_RELATION_HEAD.EMBED_DIM = 200

but then the clip model embeddigns is returining 512.

Are this error because i'm not using the configs and the repo right ?

Is it normal that i need to manually change _C.MODEL.ROI_RELATION_HEAD.EMBED_DIM to 512 ?

Maelic commented 3 weeks ago

Hello again,

You don't need to change the defaults.py file every time, you can change the config file in .yaml that you are using instead. If you want to use the clip embeddings for text features you can specify the following:

MODEL.TEXT_EMBEDDING: clip 
MODEL.ROI_RELATION_HEAD.EMBED_DIM: 512

If you want to use the default glove, don't specify anything or do:

MODEL.TEXT_EMBEDDING: glove.6B 
MODEL.ROI_RELATION_HEAD.EMBED_DIM: 200

From my early experiments, clip features give an extra boost of a few % in model performance, with slightly slower training and inference.

Hope this helps.

NilBiescas commented 3 weeks ago

Thank you !!!! I think it could be added in the documentation of the repor maybe.

Maelic commented 3 weeks ago

Yeah I will do it in a future update, once I have more results on the performance of the CLIP encoder for text embeddings.

Maelic / SGG-Benchmark

RuntimeError: The size of tensor a (200) must match the size of tensor b (512) at non-singleton dimension 1 #8