lujuncong2000 commented 5 months ago

Thanks for your excellent job! And I had a try on your chat demo yet, it showed wonderfully. Then I want to use this model to summary my videos. According to README, I download the transnetv2-pytorch-weights.pth and modified some key-value pairs. Can you tell me how to do then? model: fix_total: False prompt_order: vt # tv random arch: video_minigpt4 model_type: pretrain_vicuna freeze_vit: True freeze_qformer: False max_txt_len: 160 end_sym: "###" num_frms: 32

low_resource: True

prompt_path: "prompts/alignment_av.txt" prompt_template: '###Human: {} ###Assistant: ' num_query_token: 32 ckpt: "./pretrain/transnetv2-pytorch-weights.pth"

Vicuna

llama_model: "YOUR_VICUNA_7B_DIR" visual_target: True audio_target: False asr_audio: True av_target: False whole_video: True multishot: True mix_multishot: False system_prompt: "" # "Given a video, you will be able to see the frames once I provide it to you. Please answer my questions. answer_prompt: "" # "In the audio, " "The video shows" question_prompt: "The audio transcripts are: {asr}. " multishot_prompt: "This is a video with {num_shot} shots. " multishot_secondary_prompt: "The {shot_idx_text} shot is "

datasets: bdmsvdc_multishot_minigpt_caption: flexible_sampling: True vis_processor: train: name: "blip_video_train" n_frms: 4 image_size: 224 eval: name: "blip_video_eval" n_frms: 4 image_size: 224 text_processor: train: name: "blip_caption" max_words: 600 eval: name: "blip_caption" max_words: 600

run: task: video_text_pretrain lr_sched: "linear_warmup_cosine_lr" init_lr: 8e-5 min_lr: 8e-6 warmup_lr: 8e-6 accum_grad_iters: 2

weight_decay: 0.05 max_epoch: 40 batch_size_train: 10 batch_size_eval: 10 num_workers: 10 warmup_steps: 30

seed: 42 output_dir: "output/video_minigpt4"

amp: True resume_ckpt_path: null

evaluate: False re_evaluate: False

train_splits: ["train"]

valid_splits: ["val"]

test_splits: ["msrvtt_test_train_fake_multishot"]

test_splits: ["anet_test_fake_multishot_multi_trunk"]

test_splits: ["anet_test_fake_multishot_v2"]

test_splits: ["anet_test_fake_multishot_v3"]

train_splits: ["20k_train_multishot"] valid_splits: ["20k_val_multishot"] test_splits: ["20k_test_multishot"] device: "cuda" world_size: 4 dist_url: "env://" distributed: True

lujuncong2000 commented 5 months ago

youthHan commented 5 months ago

Hi how is the demo running?

For the KeyError, it seems to be mismatching of the uploading checkpoint. Can you please try simply load the ckpt instead of ckpt['model']?

lujuncong2000 commented 5 months ago

ok. i will try soon.

发自我的iPhone

------------------ Original ------------------ From: Mingfei Han @.> Date: Wed,Jun 12,2024 2:51 PM To: bytedance/Shot2Story @.> Cc: Briefness @.>, Author @.> Subject: Re: [bytedance/Shot2Story] How to evaluate my own videos ondemo_video.py? (Issue #8)

Hi how is the demo running?

For the KeyError, it seems to be mismatching of the uploading checkpoint. Can you please try simply load the ckpt instead of ckpt['model']?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

youthHan commented 4 months ago

Hi, I will close this issue due to long-time inactivity. Please feel free to open if you still have questions.

bytedance / Shot2Story

How to evaluate my own videos on demo_video.py? #8

low_resource: True

Vicuna

train_splits: ["train"]

valid_splits: ["val"]

test_splits: ["msrvtt_test_train_fake_multishot"]

test_splits: ["anet_test_fake_multishot_multi_trunk"]

test_splits: ["anet_test_fake_multishot_v2"]

test_splits: ["anet_test_fake_multishot_v3"]