facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.45k stars 925 forks source link

Generated TextVQA OCR features different from original available ones #1247

Closed soonchangAI closed 1 year ago

soonchangAI commented 2 years ago

❓ Questions and Help

Hi, I try to generate OCR features for TextVQA. I observed newly generated OCR features with key features are different from the original available ones. Is this an expected behavior ? How can I generate exact same OCR features ? I hope @ronghanghu will be able to help me.

PyTorch: 1.7 Python: 3.7

There is warning:

/home/cybertron/test_rcnn/vqa-maskrcnn-benchmark-m4c/maskrcnn_benchmark/structures/boxlist_ops.py:45: UserWarning: This overload of nonzero is deprecated:
    nonzero()
Consider using one of the following signatures instead:
    nonzero(*, bool as_tuple) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)

Line to generate OCR features:

python mmf/projects/m4c/scripts/extract_ocr_frcn_feature.py \
    --detection_cfg=detectron_model.yaml \
    --detection_model=detectron_model.pth \
    --imdb_file=imdb_train_ocr_en.npy \
    --image_dir=$image_dir \
    --save_dir=$save_dir

python mmf/tools/scripts/features/lmdb_conversion.py \
    --mode=convert \
    --lmdb_path=features.lmdb \
    --features_folder=$save_dir

Example generated OCR features:

{'feature_path': 'dae7b07540e932a1',
 'features': array([[0.        , 6.218356  , 0.        , ..., 3.5473397 , 0.        ,
         0.        ],
        [0.        , 4.286308  , 0.        , ..., 2.0838366 , 0.70099556,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.504341  , 0.        , ..., 0.03236384, 0.        ,
         0.        ],
        [0.        , 0.        , 0.63837045, ..., 1.797171  , 0.        ,
         0.        ],
        [0.        , 6.2734184 , 0.        , ..., 2.454718  , 0.        ,
         0.        ]], dtype=float32),
 'image_height': None,
 'image_width': None,
 'num_boxes': None,
 'objects': None,
 'cls_prob': None,
 'bbox': None}