Closed Li-Qingyun closed 2 years ago
This UniT codebase is expected to be run with batch size 1 (one image per GPU), so that images of different sizes can be used in the forward and backward passes.
This UniT codebase is expected to be run with batch size 1 (one image per GPU), so that images of different sizes can be used in the forward and backward passes.
Thanks for your reply! I have read through the UniT codebase and the DETR codebase. I think that the original DETR alined the images size of one batch along with the masks, composing a NeastedTensor object. It seems that the samplelist does not contains NeastedTensor obj. in the UniT. The NeastedTensor obj. is constructed in the forward func. of the UniTBaseModel, whose masks are actually not used. Hence, the size of the mini-batch has to be 1. It indeed does not matter for UniT implementation, however affect the fine-tuning for down-stream tasks (smaller samples). Meanwhile, i tried to fine-tune the COCO inited DETR on my own detection dataset, on the UniT codebase and on the DETR codebase. In contrast, with the same hyper-params. setting and params. init., the one in UniT codebase converage slower, and I haven't found the reason yet, is there something wrong with my settings, or is there a optimization difference between mmf framework and DETR codebase.
My question seems lengthy. Thanks for your apply!
I think the UniT codebase in MMF should deliver similar detection results when used for detection pretraining alone -- we earlier verified this when pretraining on does the batch size and other details match between MMF and the DETR codebase? (Note that the COCO single-task config in https://github.com/facebookresearch/mmf/blob/main/projects/unit/configs/coco/single_task.yaml, https://www.internalfb.com/intern/staticdocs/mmf/docs/projects/unit uses a shorter schedule and gets 40.6 bbox AP on COCO while the DETR setting uses a 500-ep schedule and gets around 43.3 bbox AP on COCO)
Command
python mmf_cli\run.py config=projects/unit/configs/coco/single_task.yaml datasets=detection_coco model=unit run_type=train training.batch_size=2
Error info
RuntimeError: The expanded size of the tensor (428) must match the existing size (480) at non-singleton dimension 1. Target sizes: [3, 428, 640]. Tensor sizes: [3, 480, 640]
Description
I tried batchsize=1, the error did not occor. So I gusessed that the processor resulted in the RuntimeError.