UARK-AICV / VLCAP

[ICIP 2022] VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
https://ieeexplore.ieee.org/document/9897766
28 stars 5 forks source link

Cannot get the same score as the score in the paper #8

Closed yueyue0401 closed 2 years ago

yueyue0401 commented 2 years ago

In the paper https://arxiv.org/pdf/2206.12972v2.pdf, this model gets: B@4: 14.00 METEOR: 17.78 CIDEr: 32.58 ROGUEr: 36.37

However, I trained this model on my machine without changing any configuration and only got: B@4: 12.11 METEOR: 16.51 CIDEr: 31.29 ROGUEr: 32.52 And I trained this model on V100 GPU

Should I change any configuration to get the highest score?

Thank you for providing this great code!

Kashu7100 commented 2 years ago

Thank you for your interest. Could you tell me which config you used?

yueyue0401 commented 2 years ago

Below is the config I used.

{
    "attention_probs_dropout_prob": 0.1,
    "batch_size": 4,
    "beam_size": 2,
    "cuda": true,
    "data_dir": "/home/perl/video-description/VLCAP/densevid_eval/anet_data",
    "debug": false,
    "dset_name": "anet",
    "ema_decay": 0.9999,
    "eval_tool_dir": "./densevid_eval",
    "exp_id": "init",
    "freeze_voc": false,
    "grad_clip": 1,
    "hidden_dropout_prob": 0.1,
    "hidden_size": 768,
    "initializer_range": 0.02,
    "intermediate_size": 768,
    "label_smoothing": 0.1,
    "layer_norm_eps": 1e-12,
    "log": "results/anet_re_init_2022_10_23_19_59_12/model",
    "lr": 0.0001,
    "lr_warmup_proportion": 0.1,
    "max_es_cnt": 10,
    "max_n_sen": 6,
    "max_t_len": 22,
    "max_v_len": 100,
    "memory_dropout_prob": 0.1,
    "mtrans": false,
    "n_best": 1,
    "n_epoch": 50,
    "n_memory_cells": 1,
    "no_cuda": false,
    "no_pin_memory": false,
    "num_attention_heads": 12,
    "num_hidden_layers": 2,
    "num_workers": 8,
    "pin_memory": true,
    "recurrent": true,
    "res_dir": "results/anet_re_init_2022_10_23_19_59_12",
    "res_root_dir": "results",
    "resume_path": null,
    "save_mode": "best",
    "save_model": "results/anet_re_init_2022_10_23_19_59_12/model",
    "seed": 2019,
    "share_wd_cls_weight": false,
    "type_vocab_size": 2,
    "untied": false,
    "use_beam": false,
    "use_env": true,
    "use_lang": true,
    "v_duration_file": "/home/perl/video-description/VLCAP/video_feature/anet_duration_frame.csv",
    "val_batch_size": 12,
    "video_feature_dir": "/home/perl/video-description/VLCAP/video_feature/anet_trainval",
    "video_feature_size": 2048,
    "voc_path": "/home/perl/video-description/VLCAP/cache/anet_vocab_clip.pt",
    "vocab_size": 10655,
    "word2idx_path": "/home/perl/video-description/VLCAP/cache/anet_word2idx.json",
    "word_vec_size": 512,
    "xl": false,
    "xl_grad": false
}

Thank you!

Kashu7100 commented 2 years ago

I am sorry to reply to you late. I have cleaned the code, so please remove the previous version that you have and clone the new code again from this repo to give it a try. Hopefully, you can get a similar score.

ltp1995 commented 1 year ago

I am sorry to reply to you late. I have cleaned the code, so please remove the previous version that you have and clone the new code again from this repo to give it a try. Hopefully, you can get a similar score.

Hi, can you directly share your generated paragraph captioning results on YC2 and activitynet datasets? (e.g., json files), i want to directly compare my results with your VLcap(ICIP2022) and VLTinT(AAAI2023) results based on automatic evaluation metrics. My email address is ltpfor1225@gmail.com, Thanks!

Kashu7100 commented 1 year ago

@ltp1995 Thank you for the inquiry, I am sorry for the late response. I didn't notice your comment on this closed issue. Please check your email.