airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
923 stars 157 forks source link

Corrupt VQA dataset from the source provided #119

Open omm-prakash opened 6 months ago

omm-prakash commented 6 months ago

I have followed the steps in the README file to fine-tune my model on the VQA dataset—the sources are in the same file.

When I ran the bash run/vqa_finetune.bash 0 vqa_lxr955_tiny --tiny perfectly. But I started running the fin-tuning on the entire dataset, and the errors below came out.

/content/drive/MyDrive/lxmert2/lxmert# bash run/vqa_finetune.bash 0 vqa_lxr955            
Load 632117 data from split(s) train,nominival.
Start to load Faster-RCNN detected objects from data/mscoco_imgfeat/train2014_obj36.tsv
Loaded 82783 images in file data/mscoco_imgfeat/train2014_obj36.tsv in 479 seconds.
Start to load Faster-RCNN detected objects from data/mscoco_imgfeat/val2014_obj36.tsv
Traceback (most recent call last):
  File "/content/drive/MyDrive/lxmert2/lxmert/src/tasks/vqa.py", line 178, in <module>
    vqa = VQA()
  File "/content/drive/MyDrive/lxmert2/lxmert/src/tasks/vqa.py", line 36, in __init__
    self.train_tuple = get_data_tuple(
  File "/content/drive/MyDrive/lxmert2/lxmert/src/tasks/vqa.py", line 22, in get_data_tuple
    tset = VQATorchDataset(dset)
  File "/content/drive/MyDrive/lxmert2/lxmert/src/tasks/vqa_data.py", line 100, in __init__
    img_data.extend(load_obj_tsv(
  File "/content/drive/MyDrive/lxmert2/lxmert/src/utils.py", line 45, in load_obj_tsv
    item[key] = np.frombuffer(base64.b64decode(item[key]), dtype=dtype)
  File "/usr/lib/python3.10/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

I debugged the same and found that the issue comes from the absence of shape checking in the lxmert/src/utils.py file, in function load_obj_tsv.(commented part is the polished code.)

I changed the loop with below

decode_config = [
    ('objects_id', (boxes, ), np.int64),
    ('objects_conf', (boxes, ), np.float32),
    ('attrs_id', (boxes, ), np.int64),
    ('attrs_conf', (boxes, ), np.float32),
    ('boxes', (boxes, 4), np.float32),
    ('features', (boxes, -1), np.float32),
]
for key, shape, dtype in decode_config:
    # item[key] = np.frombuffer(base64.b64decode(item[key]), dtype=dtype)
    # item[key] = item[key].reshape(shape)
    # item[key].setflags(write=False)
    try:
        item[key] = item[key] + '=' * (4 - len(item[key]) % 4)  # Add missing padding
        decoded_data = base64.b64decode(item[key])
        item[key] = np.frombuffer(decoded_data, dtype=dtype)
        item[key] = item[key].reshape(shape)
        item[key].setflags(write=False)
    except Exception as e:
        print(f">> Error decoding {key}: {e}, skiping the error!!")
        print(f"Decoded data length: {len(decoded_data)}")
        print(f"Expected shape: {shape}, dtype: {dtype}")
        continue  

This handled the issue but raised new errors, such as a mismatch in the shape of the matrices of the model and pre-trained model.

Please guide me on how to proceed further.