Key error when using newly generated OCR features for TextVQA

❓ Questions and Help

Machine: 2 GPU Hi, I generated new OCR features for TextVQA (reduced validation set) using images downloaded from the textvqa website: https://textvqa.org/dataset/. I replaced the ocr features LMDB file location in defaults.yaml

features:
      train:
      - textvqa/defaults/features/open_images/detectron.lmdb,textvqa/ocr_en/features/new_ocr_features.lmdb

Error

I faced a key error:

  File "/home/cybertron/mmf/mmf/datasets/databases/readers/feature_readers.py", line 266, in _load
    img_id_idx = self.image_id_indices[image_id]
KeyError: b'train/e1ad82ad7b00d0dc'

Troubleshooting 1

using following code in feature_readers.py:

        with self.env.begin(write=False, buffers=True) as txn:
            self.image_ids = pickle.loads(txn.get(b"keys"))
            logging.warning("Exists or not")
            logging.warning(b'train/e1ad82ad7b00d0dc' in self.image_ids)
            self.image_id_indices = {
                self.image_ids[i]: i for i in range(0, len(self.image_ids))
            }

The resulting output if use original OCR feature LMDB:

WARNING 2022-05-10T22:33:18 | root: Exists or not
WARNING 2022-05-10T22:33:18 | root: True
WARNING 2022-05-10T22:33:18 | root: b'train/e1ad82ad7b00d0dc'
WARNING 2022-05-10T22:33:18 | root: 21352
WARNING 2022-05-10T22:33:18 | root: Exists or not
WARNING 2022-05-10T22:33:18 | root: True
WARNING 2022-05-10T22:33:18 | root: b'train/e1ad82ad7b00d0dc'
WARNING 2022-05-10T22:33:18 | root: 5565

The resulting output if use newly generated OCR feature LMDB:

root: Exists or not
WARNING 2022-05-10T22:31:20 | root: True
WARNING 2022-05-10T22:31:20 | root: b'train/e1ad82ad7b00d0dc'
WARNING 2022-05-10T22:31:20 | root: 21352
WARNING 2022-05-10T22:31:20 | root: Exists or not
WARNING 2022-05-10T22:31:20 | root: False
WARNING 2022-05-10T22:31:20 | root: b'train/e1ad82ad7b00d0dc'

Troubleshooting 2

Using following code

import lmdb 
import os
import pickle

env = lmdb.open(
            'detectron.lmdb',
            subdir=os.path.isdir('detectron.lmdb'),
            readonly=True,
            lock=False,
            readahead=False,
            meminit=False,
        )
with env.begin(write=False, buffers=True) as txn:
    image_info = pickle.loads(txn.get(b'train/e1ad82ad7b00d0dc'))

Output: Successfully query the detectron.lmdb using key b'train/e1ad82ad7b00d0dc'

Troubleshoot 3

Seems like two self.image_id_indices dictionaries are generated. The second dictionary has lesser keys than the first dictionaries. When using original visual features, both dictionaries have same number of keys. example3

facebookresearch / mmf