MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research
Apache License 2.0
318 stars 64 forks source link

Bbox problem #79

Open dreichCSL opened 2 years ago

dreichCSL commented 2 years ago

Thanks for this repo!

I'm trying to train BUTD with GQA, but running into several issues (fixing one causes the next etc). It seems that there is an issue with the bounding box calculation. Could you explain why the expected size of bboxes is 5 values and not 4? This is defined in openvqa/core/base_cfgs.py. I'm getting the following error when running as is:

File "openvqa/openvqa/core/base_dataset.py", line 87, in forward return self.gqa_forward(feat_dict) File "openvqa/openvqa/models/butd/adapter.py", line 55, in gqa_forward bbox_feat = self.bbox_linear(bbox_feat) File "lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (51200x4 and 5x1024)** (Note: batch size is 512)

The bbox layer expects 5 input values, but the data preprocessing code only produces 4 (which makes perfect sense to me, but the 5 is hard-coded, which seems weird). If I change the code to 4, I'm getting a gradient/backprop error and it crashes, as well.

Any ideas here? Thanks!

dreichCSL commented 2 years ago

Update:

I modified the following files to be able to train BUTD without errors: (Using torch 1.11.0, spacy 2.3.7, en_vectors_web_lg 2.3.0)

1) Reverted changes from commit #75 (increases bbox feature dim to 5 again, like it used to be before it was "fixed" in openvqa/datasets/gqa/gqa_loader.py) 2) Added USE_AUX_FEAT = False in openvqa/configs/gqa/butd.yml 3) Modified Line 50 in openvqa/openvqa/models/butd/net.py from inplace=True -> inplace=False

Point 3 fixes issues in gradient calculation during training because of in-place operations.

Disclaimer: I've so far only attempted to train a BUTD model with GQA from scratch for which these changes were necessary. I can't say if any of the listed changes would cause issues to already existing models or even other models ((1) could affect non-BUTD models that use GQA)