doc-doc / HQGA

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
MIT License
30 stars 4 forks source link

Missing qas_bert test file and training parameter for NExT-QA #5

Closed jayzhu02 closed 2 years ago

jayzhu02 commented 2 years ago

Hi,

I really appreciate your excellent work! I tried to train on NExT-QA, but first I found that it doesn't have qns _bert file for the test.

And one more thing, when I shift to NExT-QA and changed some parameters like max_qa_length, bbox_num. But it ran out a bug during training:

Traceback (most recent call last):
  File "D:/code/pycharm/Nextqa/HQGA/main_qa.py", line 94, in <module>
    main(args)
  File "D:/code/pycharm/Nextqa/HQGA/main_qa.py", line 59, in main
    vqa.run(f'{model_type}-{model_prefix}-22-39.88.ckpt', pre_trained=False)
  File "D:\code\pycharm\Nextqa\HQGA\videoqa.py", line 97, in run
    train_loss, train_acc = self.train(epoch)
  File "D:\code\pycharm\Nextqa\HQGA\videoqa.py", line 121, in train
    out, prediction, _ = self.model(video_inputs, qas_inputs, qas_lengths, temp_input)
  File "C:\Users\17965\anaconda3\envs\hqga\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\code\pycharm\Nextqa\HQGA\networks\VQAModel\HQGA.py", line 102, in forward
    vid_feats = self.vid_encoder(vid_feats)
  File "C:\Users\17965\anaconda3\envs\hqga\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\code\pycharm\Nextqa\HQGA\networks\Encoder\EncoderVid.py", line 77, in forward
    bbox_features = self.tohid(bbox_features)
  File "C:\Users\17965\anaconda3\envs\hqga\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\17965\anaconda3\envs\hqga\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
    input = module(input)
  File "C:\Users\17965\anaconda3\envs\hqga\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\17965\anaconda3\envs\hqga\lib\site-packages\torch\nn\modules\linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\17965\anaconda3\envs\hqga\lib\site-packages\torch\nn\functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

I debug it, it seems that the shape of video features is different from msvd, should I change the parameter in videoqa.py like:

feat_dim = 2048
bbox_dim = 5
num_clip, num_frame, num_bbox = 8, 8*4, 10  # For msvd
feat_hidden, pos_hidden = 256, 128
word_dim = 300
 vocab_size = None if self.use_bert else len(self.vocab)

num_class = 1 if self.multi_choice else 1853 #4001 for msrvtt, 1853 for msvd, 1541 for frameQA in TGIF-QA
doc-doc commented 2 years ago

Hi, thanks for you interest. Please find the test data here. Also, you have to edit the num_bbox to 20.