Closed lujuncong2000 closed 4 months ago
Hi how is the demo running?
For the KeyError, it seems to be mismatching of the uploading checkpoint. Can you please try simply load the ckpt instead of ckpt['model']?
ok. i will try soon.
发自我的iPhone
------------------ Original ------------------ From: Mingfei Han @.> Date: Wed,Jun 12,2024 2:51 PM To: bytedance/Shot2Story @.> Cc: Briefness @.>, Author @.> Subject: Re: [bytedance/Shot2Story] How to evaluate my own videos ondemo_video.py? (Issue #8)
Hi how is the demo running?
For the KeyError, it seems to be mismatching of the uploading checkpoint. Can you please try simply load the ckpt instead of ckpt['model']?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hi, I will close this issue due to long-time inactivity. Please feel free to open if you still have questions.
Thanks for your excellent job! And I had a try on your chat demo yet, it showed wonderfully. Then I want to use this model to summary my videos. According to README, I download the transnetv2-pytorch-weights.pth and modified some key-value pairs. Can you tell me how to do then? model: fix_total: False prompt_order: vt # tv random arch: video_minigpt4 model_type: pretrain_vicuna freeze_vit: True freeze_qformer: False max_txt_len: 160 end_sym: "###" num_frms: 32
low_resource: True
prompt_path: "prompts/alignment_av.txt" prompt_template: '###Human: {} ###Assistant: ' num_query_token: 32 ckpt: "./pretrain/transnetv2-pytorch-weights.pth"
Vicuna
llama_model: "YOUR_VICUNA_7B_DIR" visual_target: True audio_target: False asr_audio: True av_target: False whole_video: True multishot: True mix_multishot: False system_prompt: "" # "Given a video, you will be able to see the frames once I provide it to you. Please answer my questions. answer_prompt: "" # "In the audio, " "The video shows" question_prompt: "The audio transcripts are: {asr}. " multishot_prompt: "This is a video with {num_shot} shots. " multishot_secondary_prompt: "The {shot_idx_text} shot is "
datasets: bdmsvdc_multishot_minigpt_caption: flexible_sampling: True vis_processor: train: name: "blip_video_train" n_frms: 4 image_size: 224 eval: name: "blip_video_eval" n_frms: 4 image_size: 224 text_processor: train: name: "blip_caption" max_words: 600 eval: name: "blip_caption" max_words: 600
run: task: video_text_pretrain lr_sched: "linear_warmup_cosine_lr" init_lr: 8e-5 min_lr: 8e-6 warmup_lr: 8e-6 accum_grad_iters: 2
weight_decay: 0.05 max_epoch: 40 batch_size_train: 10 batch_size_eval: 10 num_workers: 10 warmup_steps: 30
seed: 42 output_dir: "output/video_minigpt4"
amp: True resume_ckpt_path: null
evaluate: False re_evaluate: False
train_splits: ["train"]
valid_splits: ["val"]
test_splits: ["msrvtt_test_train_fake_multishot"]
test_splits: ["anet_test_fake_multishot_multi_trunk"]
test_splits: ["anet_test_fake_multishot_v2"]
test_splits: ["anet_test_fake_multishot_v3"]
train_splits: ["20k_train_multishot"] valid_splits: ["20k_val_multishot"] test_splits: ["20k_test_multishot"] device: "cuda" world_size: 4 dist_url: "env://" distributed: True