UARK-AICV / VLCAP

[ICIP 2022] VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
https://ieeexplore.ieee.org/document/9897766
28 stars 5 forks source link

bash scripts/train.sh anet true generates #13

Closed IQraQasim10 closed 1 year ago

IQraQasim10 commented 1 year ago

Hi, after running bash scripts/train.sh anet true generates there is a long list of

---------------------------------------------------------
>>>>>>>> Running training on anet dataset
>>>>>>>> using lang feature: true
---------------------------------------------------------
2022-11-28 11:21:20,576: Mode train
2022-11-28 11:21:20,576: Loading data from /home/iqa000/Workspace/VLCAP-main/densevid_eval/anet_data/train.json
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10009/10009 [00:00<00:00, 1445768.80it/s]
2022-11-28 11:21:20,615: Loading complete! 10009 examples
  0%|                                                                                                                                                                                                        | 0/10009 [00:00<?, ?it/s]missing
missing
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10009/10009 [00:00<00:00, 148278.61it/s]
Missing 20020 features (clips/sentences) from 10009 videos
Missing {'tSk1GWyofaU', 'zu960Glpzo4', 'Vx6vP1oxiAg', 'FMlWHXByLL0', 'evj6y2xZCnM', 'zBmVL3I3nFU', '2Is_nJdG2to', 'cSfs5ht9sro', 'HKkzII7ap7E', 'ULBhK8jXNws', 'c1eUdyyT4zg', '7AkyOhKkT6g', 'XP5Oqr1giQ4', 'VcthLhKIntA', 'uOmCwWVJnLQ', 'HeMpg3SAUUs', 'hYBctolxeqQ', 'XSNenkxgryQ', '2x-Xqt98Ek4', 'ot7hBY4lQ2c', 'SSLcbqaBiRM', 'VdeYnCIbRJ4', 'j1oB2NAlYsQ', 'w2fsq9BOoZo', 'n6k21NjvqXE', '9mF5s6_dTlk', 'weB3srg6o4c', 'WmabLngcvas', 'Kp7pUEKrb8Q', 'OdLcbH2H_zI', 'BERvPz1e_AU', 'a25vC5zsf6A', '1dM62Xpm9Ns', 'KfP205pf7PU', 'pu-2w-UxdYg', 'KKbfCtmIE0o', 'Lan3mtnCmlw', '8SCg3toperM', 'J6ScF5n_Cug', 'sCzauf2u4dc', '0AjYz-s4Rek', 'L1XpfS1RCzE', 'qkN9uA8izVE', 'r40TuTkt9y4', '2j-DRUk2yCs', 'u7dfBgc_SqU', '5HCYb6qfkdk', 'nVk5nIE-6bM', 'eGLD-0b1LV0', 'iUe1t0sN4Jo', '3pBldeB3uaE', 'n1KeC6NXPUA', 'Z1POv1Qeno0', '_NwkwvaC7Bg', '5zPTTiJiXUY', 'w5J3Gt5WLwU', '9XyrLUWZl40', 'ru7UAr2488M', '7MWDmMh3zyA', 'HDHS_7pOiDk', '_9h6NBOPTy8', '-jl_v7zi17A', '33SI8z8PovA', 'UB2GzjNzo3M', 'ofrX4WyAM-0', 'JOYduGqZSRc', 'w6Avae5on_0', 'KYtV2vpwuVw', '-NndIs9BaS4', 'qNxA4UTadGo', 'w_wIOJrztdU', 'mgmwdQixDXY', '9ku5v_hSVMw', 'RgzbNJPchqc', 'uUzmPV8Vgqg', 'auxBRPzLiIo', 'VwclmKWo_-M', 'D9eo9NfFhkg', 'G5ueYVLGtm8', 'pwPid8YHHpU', 'F6cNWYlfUs8', '_cZD6JN-SYg', 'VDX1IQnUMgo', '_vbwjI1QA7g', 'P9jIpcRGeOk', '6hsOVkC7hxA', 'yeUuZ9vk5gE', 'VEihQG2UWKE', 'O8vPTn6Ho7w', 'TTDruR5Vin4', 'PbzmcZ_IORE', '7lNAmkaMyyg', 'ABB755sPZfY', 'ui_CNb4FUtQ', 'fLvPz8W00l4', 'Epl3pExUuNs', 'zVMDHCnT-d4', 'zD_wAe6Eoxc', '2PAVJbmj2lQ', '8AP2he781Cw', 'hFtmkU7wdx4', '3zDw5mwGIW0', 'Y-1QkIGm81w', 'ucEqZtmQS-0', '7EPzlmJ25dA', 'qi_6u0mMJQM', 'W_iKlOPSDos', 'M2ntILX6VP0', 'QkqsI11OtC8', 'nwznKOuZM7w', 'b82y7f7TFbw', 'tRatWgaZ-a0', '_--nxrRXdPg', 'OhXBMlKOHMI', 'kt_sGN-1prU', 'nnWJGghixr0', 'tydn-vo3DaY', 'sQo4gMcgfT4', 'O4P07fipvIA', '__mIAEE03bE', 'B5Zi054Fa5k', '9g-5J05BIiQ', 'FkSf3pxra3M', 'keFBEoBy0zY', 'zuqNxHmtBD8', 'bJ1vEQKX-hE', 'm8SFyH4vhik', 'diBZlwUO8rc', '_4CLYKFzmoY', 'KFIxTdJtXAE', 'iDz8nKDpumY', '3X2CY79a0X8', 'gdr6iVHHYcU', 'PlAVnu-ueM4', '06eyqLosXjU', 'qumU7AgV3Mk', 'vu65aIIJHtU', 'Fdjw9ld-hbA', 'dyLGepr7VR0', 'zvAlL20-K4w', '6SHSstpZN1I', '0ZzKrBk1ac8', 'ThYidZUtnuo', 'mpyN1mrMl3U', '0jBwj0bfZ3Y', 'fJ7gcHxxJMM', 'm2DOej6tPNs', 'xIU6DO35R_c', 'YuCMWTdK_DY', 'QPKJDlQSO6c', 'Y1j_e1DXW6I', 'L9MTwigRhmk', 'ufK2mbJI0to', 'PNdG3SUdJzc', 'mG8h5rX3OnU', 'FXl3qRRs9jw', 'vw64k9rIi_g', '4mRdgV8t4KY', 'hQXWnoipdFE', 'aq41GgfAlDo', 'Tm1ebIrDyz0', '6lYTHj9vImo', '57cM1GcKktw', '09Kr5TQ9DHQ', 'xE43h7Kd9Oc', 'CNH37tJNzFE', 'Bhz-WgJH8R0', '8dXbbJWFEJo', 'yJgC3-t_ciw', 'IjKWgD0y4rc', 
't5Br7yOUe4g', 'q-ID2mgEIow', 'SfiAcQAPpQ8', 'M6yAoJJQvGY', 'Gc1Mk5UyECQ', 'HEfOp_pz_j4', 'aEyTdUOp-qs', '5TV-V6Cxero', 'PXBcPu2_KOo', 'oOnKQgQZOZ0', 'OIA7lPraPSM', 'my4UPLGI6w4', 'deuSw3RnNLU', 'Igm1Mx4Ng1k', 'IDr50VT8BK8', 'uu4_cV49pMI', 'BnkUgUQBED0', 'jsu65VwKf74', '96krk6Ka9Vc', 'b7fs8OAJzQk', 'Y76wuHBZgdU', 'uzXbaoWOm5o', 'mX3gbTBdbKY', 'bj4nkWPdqIY', 'INmaUkmVK24', 'GySHt3Z6Lt4', 'UGCn1zgYboQ', 'x7lP6GKepco', 'IKTYMYu8FFs', '9rW35YTKYq8', 'ykcLgz3DlYg', '9n_cwQLpo_c', '_B3Q8bTJWG4', 'DlE6Rtuo__o', 'J_SD_hhGET8', '0CTnYEE7rdo',

But then it displays the following error.

File "/home/iqa000/Workspace/VLCAP-main/src/train.py", line 684, in main() File "/home/iqa000/Workspace/VLCAP-main/src/train.py", line 630, in main train_loader = DataLoader(train_dataset, collate_fn=collate_fn, File "/home/iqa000/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 344, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] File "/home/iqa000/.local/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 107, in init raise ValueError("num_samples should be a positive integer " ValueError: num_samples should be a positive integer value, but got num_samples=0****

I am following read.me like a holy thing and there is data also in the data folder. (env, lang, sens) After running evaluation command _bash scripts/translate_greedy.sh anet_re_init_2022_11_22_17_1143 test I also get similar type of error. Is there some issue with dataloader?

Also, I noticed that after running _bash scripts/translate_greedy.sh anet_re_init_2022_11_22_17_1143 test results/anetre*/greedy_pred_val.json json file is note created.

Kashu7100 commented 1 year ago

It seems some of the input features are missing. Could you double-check the data and path used in your VLCAP/src/rtransformer/recursive_caption_dataset.py?

IQraQasim10 commented 1 year ago

Oh. it solved the issue of missing features and did work in retrieving the env, lang and sent features. Could you please let me know where would I have to make these changes for evaluation part? Also, I would like to mention, I added

import nltk nltk.download('punkt')

in src/train.py and the training worked perfect.

Would you recommend working with wandb?

Kashu7100 commented 1 year ago

Can you change the data path in VLCAP/src/rtransformer/recursive_caption_dataset.py if you put your data somewhere else?

I think wandb is a useful tool to monitor the training.

IQraQasim10 commented 1 year ago

My training results are pretty much similar to what you have mentioned in Table 5 in VLCAP paper. As I trained my model on anet dataset and feature= true. I am expecting to evaluate using _bash scripts/translate_greedy.sh anet_re_init_2022_11_30_16_2643 val command. But I ran into the following problem.

(pytorch) iqa000@vs-c1:~/Workspace/VLCAPmain$ bash scripts/translate_greedy.sh anet_re_init_2022_11_30_16_26_43 val train_opt Namespace(attention_probs_dropout_prob=0.1, batch_size=4, beam_size=2, cuda=True, data_dir='/home/iqa000/Workspace/VLCAP-main/densevid_eval/anet_data', debug=False, dset_name='anet', ema_decay=0.9999, eval_tool_dir='./densevid_eval', exp_id='init', freeze_voc=False, grad_clip=1, hidden_dropout_prob=0.1, hidden_size=768, initializer_range=0.02, intermediate_size=768, label_smoothing=0.1, layer_norm_eps=1e-12, log='results/anet_re_init_2022_11_30_16_26_43/model', lr=0.0001, lr_warmup_proportion=0.1, max_es_cnt=10, max_n_sen=6, max_t_len=22, max_v_len=100, memory_dropout_prob=0.1, mtrans=False, n_best=1, n_epoch=50, n_memory_cells=1, no_cuda=False, no_pin_memory=False, num_attention_heads=12, num_hidden_layers=2, num_workers=8, pin_memory=True, recurrent=True, res_dir='results/anet_re_init_2022_11_30_16_26_43', res_root_dir='results', resume_path=None, run_id='3t09b6h8', save_mode='best', save_model='results/anet_re_init_2022_11_30_16_26_43/model', seed=2019, share_wd_cls_weight=False, type_vocab_size=2, untied=False, use_beam=False, use_env=True, use_lang=True, v_duration_file='/home/iqa000/Workspace/VLCAP-main/video_feature/anet_duration_frame.csv', val_batch_size=12, video_feature_dir='/home/iqa000/Workspace/VLCAP-main/video_feature/anet_trainval', video_feature_size=2048, voc_path='/home/iqa000/Workspace/VLCAP-main/cache/anet_vocab_clip.pt', vocab_size=10655, word2idx_path='/home/iqa000/Workspace/VLCAP-main/cache/anet_word2idx.json', word_vec_size=512, xl=False, xl_grad=False) Start evaluating val Traceback (most recent call last): File "src/translate.py", line 236, in <module> main() File "src/translate.py", line 186, in main eval_data_loader = get_data_loader(opt, eval_mode=eval_mode) File "src/translate.py", line 116, in get_data_loader recurrent=opt.recurrent, untied=opt.untied or opt.mtrans) File "/home/iqa000/Workspace/VLCAPmain/src/rtransformer/recursive_caption_dataset.py", line 87, in __init__ self._load_duration() File "/home/iqa000/Workspace/VLCAPmain/src/rtransformer/recursive_caption_dataset.py", line 165, in _load_duration with open(self.duration_file, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/home/iqa000/Workspace/VLCAP-main/video_feature/anet_duration_frame.csv'

When I navigate to anet_duration_frame.csv I see that the file is here but empty. Does it have to anything with shutting down they system after training? Could you please figure this out...!

Kashu7100 commented 1 year ago

When I navigate to anet_duration_frame.csv I see that the file is here but empty.

The duration file should contain the duration info. Can you download the corresponding file again from the main branch of this repo? Also, please double check your path (/home/iqa000/Workspace/VLCAP-main/video_feature/anet_duration_frame.csv).

IQraQasim10 commented 1 year ago

YES. rechecking with the setup.sh and resolving the path solved the issue. The Evaluation is on its way ...

2022-12-02 12:51:35,615: Mode val
2022-12-02 12:51:35,616: Loading data from /home/iqa000/Workspace/VLCAP-main/densevid_eval/anet_data/anet_entities_val_1.json
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2460/2460 [00:00<00:00, 1284929.99it/s]
2022-12-02 12:51:35,669: Loading complete! 2460 examples
  0%|                                                                                                                                                                                                                                | 0/2460 [00:00<?, ?it/s]missing
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2460/2460 [00:00<00:00, 53009.26it/s]
Missing 1 features (clips/sentences) from 1 videos
Missing {'j73Wh1olDsA'}
[Info] Trained model state loaded.
  - (Translate):  20%|██████████████████████████████████████▊          
IQraQasim10 commented 1 year ago

What is your recommendation on yc2 dataset and training plus evaluation.

IQraQasim10 commented 1 year ago

I would like to share the results anet-test

{
    "Bleu_1": 0.5353381506518596,
    "Bleu_2": 0.33391063758245343,
    "Bleu_3": 0.21240864474730795,
    "Bleu_4": 0.13808044790562626,
    "METEOR": 0.1728796180211062,
    "ROUGE_L": 0.36290273965969605,
    "CIDEr": 0.30131769467803965,
    "gt_stat": {
        "avg_sen_len": 14.806818181818182,
        "num_sen": 8712,
        "vocab_size": 5510
    },
    "submission": {
        "avg_sen_len": 11.630165289256198,
        "num_sen": 8712,
        "vocab_size": 1453
    },
    "num_evaluated": 2456,
    "num_gt": 2457,
    "num_pred": 2456,
    "re1": 0.39192528249105507,
    "re2": 0.17046172781410823,
    "re3": 0.10296342311520865,
    "re4": 0.06931700901990746
}

anet-val


{
    "Bleu_1": 0.534599739771031,
    "Bleu_2": 0.33526446720433706,
    "Bleu_3": 0.21478895575947762,
    "Bleu_4": 0.14042879116199797,
    "METEOR": 0.17454953193732303,
    "ROUGE_L": 0.3650001728174786,
    "CIDEr": 0.3180997324647702,
    "gt_stat": {
        "avg_sen_len": 14.912959160392425,
        "num_sen": 8766,
        "vocab_size": 5547
    },
    "submission": {
        "avg_sen_len": 11.611339265343371,
        "num_sen": 8766,
        "vocab_size": 1431
    },
    "num_evaluated": 2459,
    "num_gt": 2460,
    "num_pred": 2459,
    "re1": 0.3953617894613977,
    "re2": 0.17605731717542453,
    "re3": 0.10709377472488246,
    "re4": 0.07456230463937506
}

Thank you so much for this wonderful code

Kashu7100 commented 1 year ago

For the yc2 feature extraction, please refer to our recently published code at https://github.com/UARK-AICV/VLTinT.