Model size exploded when using larger vocab_size

KaihuaTang / Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”

MIT License

1.06k stars 228 forks source link

Model size exploded when using larger vocab_size #70

Open Lizw14 opened 4 years ago

Lizw14 commented 4 years ago

❓ Questions and Help

Hi Kaihua, I am training on GQA dataset, using a larger object_label(1685)/attribute(619)/relation(312) vocab size. However, the model size grows significantly in this case. The model cannot be put into one GPU memory (11GB) even if batchsize=1 per GPU. Intuitively model size should not be exploding like this because vocab_size only increases the size of the final layer in ROI/attribute/relation heads. Do you have any thought on why this happens?

KaihuaTang commented 4 years ago

During iterative message passing, the intermediate predictions will be embedded to improve the final prediction, so the size of vocabulary also affects some intermediate layers. Maybe you can try TransformerPredictor, as far as I know, it's the most memory efficient model.