PaddlePaddle / ERNIE

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
6.32k stars 1.28k forks source link

ernie-vil 在 vcr 数据集上 finetune 报错 #596

Closed MindaWu closed 3 years ago

MindaWu commented 3 years ago

使用 ERNIE-ViL base 在 vcr 数据集上进行 finetune 时,出现以下错误:

Traceback (most recent call last):
  File "finetune.py", line 464, in <module>
    main(args)
  File "finetune.py", line 371, in main
    epoch=args.epoch,)
  File "/home/wmd/workspace/ERNIE-repro/ernie-vil/reader/vcr_finetuning.py", line 424, in __init__
    ImageFeaturesH5Reader(task_conf['feature_lmdb_path'])
  File "/home/wmd/workspace/ERNIE-repro/ernie-vil/reader/_image_features_reader.py", line 27, in __init__
    lock=False, readahead=False, meminit=False)
lmdb.Error: ./data/vcr/VCR_resnet101_faster_rcnn_genome_pickle2.lmdb: No such file or directory

文档里说明数据使用的是 vilbert 的数据,但是在 vilbert 提供的下载链接里并没有VCR_resnet101_faster_rcnn_genome_pickle2.lmdb,只有 VCR_gt_resnet101_faster_rcnn_genome.lmdb。 请问VCR_resnet101_faster_rcnn_genome_pickle2.lmdb该如何获取?

zhuzhf commented 3 years ago

在vilbert那个链接最下面的表格上有下载链接,VCR那一栏就是,https://www.dropbox.com/sh/9pgxc3njd3iq03o/AADXgnT1HmEdrds7aujTncBGa?dl=0

MindaWu commented 3 years ago

我将 vilbert 的文件名修改为 VCR_resnet101_faster_rcnn_genome_pickle2.lmdb 后,报错信息变成了

Traceback (most recent call last):
  File "finetune.py", line 464, in <module>
    main(args)
  File "finetune.py", line 371, in main
    epoch=args.epoch,)
  File "/home/wmd/workspace/ERNIE-repro/ernie-vil/reader/vcr_finetuning.py", line 424, in __init__
    ImageFeaturesH5Reader(task_conf['feature_lmdb_path'])
  File "/home/wmd/workspace/ERNIE-repro/ernie-vil/reader/_image_features_reader.py", line 30, in __init__
    self._image_ids = pickle.loads(txn.get('keys'.encode()))
  File "/home/wmd/anaconda3/envs/paddle_env/lib/python2.7/pickle.py", line 1388, in loads
    return Unpickler(file).load()
  File "/home/wmd/anaconda3/envs/paddle_env/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/home/wmd/anaconda3/envs/paddle_env/lib/python2.7/pickle.py", line 892, in load_proto
    raise ValueError, "unsupported pickle protocol: %d" % proto
ValueError: unsupported pickle protocol: 3

请问是还需要对数据进行一些别的操作吗?

TangDonnie commented 3 years ago

python2 和 python3的pickle读取方式不一样,将pickle的读取方式更换一下,或者提取对数据进行一下转换就可以了

MindaWu commented 3 years ago

_image_features_reader.py 中的pickle全部替换为cPickle后,仍然报错,报错信息如下:

Traceback (most recent call last):
  File "finetune.py", line 464, in <module>
    main(args)
  File "finetune.py", line 371, in main
    epoch=args.epoch,)
  File "/home/wmd/workspace/ERNIE-repro/ernie-vil/reader/vcr_finetuning.py", line 424, in __init__
    ImageFeaturesH5Reader(task_conf['feature_lmdb_path'])
  File "/home/wmd/workspace/ERNIE-repro/ernie-vil/reader/_image_features_reader.py", line 33, in __init__
    self._image_ids = cPickle.loads(txn.get('keys'.encode()))
ValueError: unsupported pickle protocol: 3
TangDonnie commented 3 years ago

解决方法是用pickcle3打开这个lmdb文件,然后用pickcle2的方式存储(把原来的cpickcle改回去),核心代码pickle.dumps(pickle.loads(value), protocol=2),详情参见https://stackoverflow.com/questions/25843698/valueerror-unsupported-pickle-protocol-3-python2-pickle-can-not-load-the-file

Tclz commented 3 years ago

请问大佬有下载好的vil-bert里VCR视觉特征文件吗 .mdb 58GB那个。VPN太不稳定了下载好多次一直失败。不胜感激!

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reopen it. Thank you for your contributions.