Closed StanLei52 closed 2 years ago
@ronghanghu
Hi, I think this is because we later switched from lmdb from Detectron (Caffe2) features to lmdb from maskrcnn-benchmark
features. We found that this change slightly boosts the TextVQA and TextCaps scores, but it might have introduced the bounding box discrepancy as you mentioned.
If you would like to use the exact features from Caffe2 (which is used in LoRRA and M4C papers), they can be downloaded by adding textvqa.caffe2
to zoo_requirements
and using textvqa/caffe2/features/open_images/detectron.lmdb
as the feature path, like in https://github.com/facebookresearch/mmf/blob/582c7195cbf1eb948436b66c1e9e4bb2e5652a27/projects/m4c_captioner/configs/m4c_captioner/textcaps/with_caffe2_feat.yaml#L6-L16
One can edit the lines in M4C config https://github.com/facebookresearch/mmf/blob/582c7195cbf1eb948436b66c1e9e4bb2e5652a27/projects/m4c/configs/textvqa/defaults.yaml#L8-L17 to change to the Caffe2 feature lmdbs
Thank you for your reply @ronghanghu.
So obj_normalized_boxes
in imdb_train_ocr_en.npy
is from detectron (caffe) and bbox
in detectron.lmdb
is from maskrcnn-benchmark
, is it correct? Since the features and bbox
in the same detectron.lmdb
is consistent, can we calculate the obj_normalized_boxes
using bbox
and its image width and height by:
orig_boxes = sample.image_info_0.bbox
w, h = sample.image_info_0.image_width, sample.image_info_0.image_height
normalized_boxes = orig_boxes / np.array([w,h,w,h])
sample.obj_bbox_coordinates = self.copy_processor(
{"blob": normalized_boxes}
)["blob"]
instead of using normalized bbox in annotation:
# 2. Load object
# object bounding box information
## fetched by mmf sample info
# if "obj_normalized_boxes" in sample_info and hasattr(self, "copy_processor"): # use copy_processor to convert to torch tensor
# sample.obj_bbox_coordinates = self.copy_processor(
# {"blob": sample_info["obj_normalized_boxes"]}
# )["blob"]
Also you mentioned the slight boosts by using the new feature extractor. I do not understand why it can boost the score since the feature and the obj_normalized_boxes do not match (i assume the feature and bbox in the same feature file always match if i understand correctly).
can we calculate the obj_normalized_boxes using bbox and its image width and height by
Yes, you can do this and directly compute the bounding boxes from the lmdb features.
Also you mentioned the slight boosts by using the new feature extractor. I do not understand why it can boost the score since the feature and the obj_normalized_boxes do not match (i assume the feature and bbox in the same feature file always match if i understand correctly).
There was only a minor boost in the scores. It was probably that the features extracted from maskrcnn-benckmark was slightly better and gave a small improvement despite the discrepancy in the bounding boxes. You can use the caffe2 lmdbs to get the exact setting in the M4C paper.
Good to know, thank you Ronghang!
❓ Questions and Help
Thank you for the wonderful MMF! I have a question related to the TextVQA annotation and extracted features used in M4C. I noticed that in M4C-dataset, it used features in
textvqa/defaults/features/open_images/detectron.lmdb
and bbox info (normalized box) intextvqa/defaults/annotations/imdb_train_ocr_en.npy
. However, the bbox info indetectron.lmdb
seems to be different to that inimdb_train_ocr_en.py
. To reproduce:and the corresponding output was:
I think
obj_normalized_boxes
should be yielded frombbox
in the feature file, but from the above result, it seems that they do not have the same order. I wonder if there is something wrong? If we use features indetectron.lmdb
and use bbox info inimdb_train_ocr_en.npy
, the bbox info should keep consistent between these two files.Looking forward to your reply.