TXH-mercury / VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
https://arxiv.org/abs/2305.18500
MIT License
241 stars 17 forks source link

Error about finetune_qa_msvd task (Miss key 'desc' or 'caption' in descs_qa_trainval.json) #16

Closed BinzheLi95 closed 8 months ago

BinzheLi95 commented 8 months ago

03/26/2024 19:23:18 - INFO - main - load_from_pretrained: ./output/vast/pretrain_vast/ckpt/model_step_204994.pt 03/26/2024 19:23:18 - INFO - main - Load from pretrained dir ./output/vast/pretrain_vast 03/26/2024 19:23:19 - INFO - main - Unexpected keys ['vision_encoder.text.logit_scale'] 03/26/2024 19:23:19 - INFO - main - missing_keys ['vision_encoder.logit_scale'] 03/26/2024 19:23:20 - INFO - main - ==================learning_rate_settings==================

03/26/2024 19:23:20 - INFO - main - basic_lr : 1e-05 03/26/2024 19:23:20 - INFO - main - clip_lr_visual : 5e-07 03/26/2024 19:23:20 - INFO - main - clip_lr_visual_len : 245 03/26/2024 19:23:20 - INFO - main - new_lr : 0 03/26/2024 19:23:20 - INFO - main - new_params_name: [] 0%| | 0/5670 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/workspace/Project/VideoLargeModel/VAST/./run.py", line 63, in main() File "/mnt/workspace/Project/VideoLargeModel/VAST/./run.py", line 46, in main train(model, optimizer, train_loader, val_loaders, args.run_cfg, start_step = start_step, verbose_time=False) File "/mnt/workspace/Project/VideoLargeModel/VAST/utils/pipeline.py", line 35, in train for step, (name, batch) in enumerate(train_loader): File "/mnt/workspace/Project/VideoLargeModel/VAST/data/loader.py", line 101, in iter self.preload(loaderit) File "/mnt/workspace/Project/VideoLargeModel/VAST/data/loader.py", line 112, in preload self.batch = next(it) File "/mnt/workspace/Project/VideoLargeModel/VAST/data/loader.py", line 48, in iter batch = next(iter) File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in next data = self._next_data() File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/_utils.py", line 644, in reraise raise exception KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/pai/envs/vast/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/workspace/Project/VideoLargeModel/VAST/data/IndexAnno.py", line 69, in getitem raw_captions = anno['desc'] if 'desc' in anno else anno['caption'] KeyError: 'caption'

I am trying the VQA task on MSVD-QA dataset. I use the "python3 -m torch.distributed.launch \ --nnodes 1 \ --node_rank 0 \ --nproc_per_node 4 \ --master_port 9834 \ ./run.py \ --learning_rate 1e-5 \ --checkpointing true \ --first_eval false \ --config ./config/vast/finetune_cfg/VQA-msvd.json \ --pretrain_dir $output_dir \ --save_best true \ --output_dir $output_dir/downstream/VQA-msvd \" command line and meet above error.

I notice the AnnoIndexedDataset(Dataset) require 'desc' or 'caption' in anno, but the msvd/descs_cap_train.json do not have these info. I want to ask how to fix thie error. Thank you.

BinzheLi95 commented 8 months ago

I solve the problem by modifying the line 70 in IndexAnno.py From raw_captions = anno['desc'] if 'desc' in anno else anno['caption']
to if 'desc' in anno or 'caption' in anno : raw_captions = anno['desc'] if 'desc' in anno else anno['caption'] else raw_captions = " "

wonzin commented 5 months ago

Hi, is your solution returns the right result? I am curious to leave the raw_captions empty is a visible solution or not

BinzheLi95 commented 5 months ago

Hi, is your solution returns the right result? I am curious to leave the raw_captions empty is a visible solution or not

Yes, I obtain the right QA result. Setting raw_captions = " " for QA task is ok because the raw_captions is not necessary for QA task.

wonzin commented 5 months ago

Thank you for your reply :) It really helps me!