Open junchen14 opened 3 years ago
hi when you compute the FLOPS in table 6 for baseline models such as ViLBERT, do you also include the FLOPS computation of feature extraction models?
Hi @junchen14,
Yes, we calculated FLOPs by summing up those of object detection backbone + object detection RCNN + NMS + modality interaction transformer for object detection-based vision-and-language models.
hi when you compute the FLOPS in table 6 for baseline models such as ViLBERT, do you also include the FLOPS computation of feature extraction models?