facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.46k stars 932 forks source link

UniT problem: the DETR processor result in only 1 sample per batch, larger batch size is required~ #1216

Closed Li-Qingyun closed 2 years ago

Li-Qingyun commented 2 years ago

Command

python mmf_cli\run.py config=projects/unit/configs/coco/single_task.yaml datasets=detection_coco model=unit run_type=train training.batch_size=2

Error info

RuntimeError: The expanded size of the tensor (428) must match the existing size (480) at non-singleton dimension 1. Target sizes: [3, 428, 640]. Tensor sizes: [3, 480, 640]

Description

I tried batchsize=1, the error did not occor. So I gusessed that the processor resulted in the RuntimeError.

ronghanghu commented 2 years ago

This UniT codebase is expected to be run with batch size 1 (one image per GPU), so that images of different sizes can be used in the forward and backward passes.

Li-Qingyun commented 2 years ago

This UniT codebase is expected to be run with batch size 1 (one image per GPU), so that images of different sizes can be used in the forward and backward passes.

Thanks for your reply! I have read through the UniT codebase and the DETR codebase. I think that the original DETR alined the images size of one batch along with the masks, composing a NeastedTensor object. It seems that the samplelist does not contains NeastedTensor obj. in the UniT. The NeastedTensor obj. is constructed in the forward func. of the UniTBaseModel, whose masks are actually not used. Hence, the size of the mini-batch has to be 1. It indeed does not matter for UniT implementation, however affect the fine-tuning for down-stream tasks (smaller samples). Meanwhile, i tried to fine-tune the COCO inited DETR on my own detection dataset, on the UniT codebase and on the DETR codebase. In contrast, with the same hyper-params. setting and params. init., the one in UniT codebase converage slower, and I haven't found the reason yet, is there something wrong with my settings, or is there a optimization difference between mmf framework and DETR codebase.

My question seems lengthy. Thanks for your apply!

ronghanghu commented 2 years ago

I think the UniT codebase in MMF should deliver similar detection results when used for detection pretraining alone -- we earlier verified this when pretraining on does the batch size and other details match between MMF and the DETR codebase? (Note that the COCO single-task config in https://github.com/facebookresearch/mmf/blob/main/projects/unit/configs/coco/single_task.yaml, https://www.internalfb.com/intern/staticdocs/mmf/docs/projects/unit uses a shorter schedule and gets 40.6 bbox AP on COCO while the DETR setting uses a 500-ep schedule and gets around 43.3 bbox AP on COCO)