TXH-mercury / VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
https://arxiv.org/abs/2305.18500
MIT License
241 stars 17 forks source link

Problem running finetuning on TGIF #24

Open poffertje opened 6 months ago

poffertje commented 6 months ago

I get the following error when trying to finetune on TGIF:

/github/workspace/src/video/video_reader.cc:270: [/scratch-shared/scur1914/gifs/tumblr_nqjzxszVxD1uz6id5o1_500.gif] Failed to measure duration/frame-count due to broken metadata.[23:11:27] /github/workspace/src/video/video_reader.cc:270: [/scratch-shared/scur1914/gifs/tumblr_nqjzxszVxD1uz6id5o1_500.gif] Failed to measure duration/frame-count due to broken metadata.

Should I transform the gifs to frames? The config file for TGIF has the vision format set to video_rawvideo. I added the following to vision_mapper.py at line 138:

if not os.path.exists(video_path): video_path = video_path.replace('.mkv', '.gif')