aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

BSD 3-Clause "New" or "Revised" License

517 stars 136 forks source link

Q1: In test.py: data = torch.load('meshed_memory_transformer.pth')

data = torch.load('saved_models/m2_transformer_best.pth')

model.load_state_dict(data['state_dict']) print("Epoch %d" % data['epoch']) print(data['best_cider'])

Error: KeyError: 'epoch', KeyError: 'best_cider'

The provided 'meshed_memory_transformer.pth' is not saved from train.py? Because when I use the saved model from training in test.py. there is no error. Where the provided 'meshed_memory_transformer.pth' comes from?

And for my own dataset, in test.py, I load the saved model, why the performance drops compared with the evaluation metrics recorded in train.py?

Q2: dict_dataset_val = val_dataset.image_dictionary({'image': image_field, 'text': RawField()}) What's the function of "image_dictionary"? What's the difference with dict_dataset_val and val_dataset? I print them out, and observed that the caption of these two are different. And the len(dict_dataset_val) is different with len(val_dataset). Why is that?

Thanks for your help!

aimagelab / meshed-memory-transformer

Reproduce results with test.py #35

data = torch.load('saved_models/m2_transformer_best.pth')

model.load_state_dict(data['state_dict']) print("Epoch %d" % data['epoch']) print(data['best_cider'])