Questions about VideoChat2_HD

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

https://vchat.opengvlab.com/

MIT License

2.85k stars 230 forks source link

Questions about VideoChat2_HD #194

Open LiJiaqi96 opened 2 weeks ago

LiJiaqi96 commented 2 weeks ago

Hi, thanks for your update of VideoChat2_HD! When trying the newly-released code, I got some questions:

The MetaLoader_rs class in "train_it_ds.py" seems to be missing.

So I still used "train_it.py", but got the following error. I'm not sure whether it could be solved by using MetaLoader_rs.

RuntimeError: stack expects each tensor to be equal size, but got [8, 3, 224, 448] at entry 0 and [8, 3, 448, 672] at entry 1

Then I changed the batch_size to 1 and solved the previous error. But it seems the load_and_transform_media_data_image function does not have dynamic_config setting, which is passed to it in "it_dataset_mistral.py". I created a pull request to modify this part.
Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.

Andy1621 commented 2 weeks ago

Thanks for your try! I will fix it later~

Andy1621 commented 2 weeks ago

@LiJiaqi96 Please have a try. have updated the code. The train_it_ds is add with deepspeed and need some change.

LiJiaqi96 commented 2 weeks ago

Thanks! I tried "train_it_ds.py" without using deepspeed, but it doesn't work. Is it possible to train without using deepspeed? Temporally I prefer not to use deepspeed.

Andy1621 commented 2 weeks ago

Yes! You can run it without deepspeed. BTW, show me you log so that I can fix the bug ~

LiJiaqi96 commented 2 weeks ago

Sorry for the late reply. The log is here train_log.txt in "config_7b_hd_stage4.py", I set enable=False in deepspeed settings.
and run the code with:

torchrun    --nnodes=${NNODE} --nproc_per_node=${NUM_GPUS} \
    --rdzv_endpoint=${MASTER_NODE}:10068 \
    --rdzv_backend=c10d \
    tasks/train_it_ds.py \
    $(dirname $0)/config_7b_hd_stage4.py \
    output_dir ${OUTPUT_DIR}

Andy1621 commented 2 weeks ago

I'm not sure whether it is cause by the deepspeed or pytorch verisons. Here are my versions of different packages:

torch                     1.13.1+cu117
torchaudio                0.13.1+cu117
torchnet                  0.0.4
torchvision               0.14.1+cu117
deepspeed                 0.14.2
transformers              4.40.1

BTW, sometimes you can fix the bug by change find_unused_parameters to True or Fasle.

LiJiaqi96 commented 2 weeks ago

Thanks, I will create an environment with exactly the same packages and have a try.

yuanrr commented 2 weeks ago

Hi, I found shared_utils_ds.py has a bug in line 58.

optimizer_params = create_optimizer(config.optimizer, model, return_group=True)

the optimizer.py may need to be updated.

Andy1621 commented 2 weeks ago

Thanks for your feedback. I have updated the code.

LiJiaqi96 commented 1 week ago

I used the new environment except flash-attn, as I used CUDA 12.1 and can only use flash-attn==2.1.0. I ran the code "scripts/videochat_mistral/run_7b_stage4_hd.sh", with "tasks/train_it.py" and deepspeed enable=False, then got error train_log0618.txt. The error seems to be caused by flash-attn.
Is it possible to run videochat2_hd using the same environment as videochat2_mistral, withou using deepspeed?

LiJiaqi96 commented 1 week ago

BTW I test to run the code on single GPU (like python train_it.py) and it iterates normally

Andy1621 commented 1 week ago

Yes, it's okay to use it without deepspeed. I use deepspeed ZERO to decrease the GPU memory~

LiJiaqi96 commented 1 week ago

I see. Is it ok for you to run on multiple GPUs without deepspeed, just as the model runs in videochat2_mistral?

LiJiaqi96 commented 1 week ago

Update: I managed to solve the previous issue by upgrading the flash-attn to 2.5.9. When I use "train_it_ds.py" and with deepspeed enable=True, I met new issue about deepspeed config: trainlog_0621.txt
Could you please help me solve that?

Andy1621 commented 6 days ago

Hi! Please try again with the newly commit.

LiJiaqi96 commented 5 days ago

Thanks for your update! Now the code could run with deepspeed enabled.
BTW, Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.

Andy1621 commented 3 days ago

Almost all the datasets can be directly downloaded from their repos or homepages~

Give me feedback if you don't find them.

LiJiaqi96 commented 3 days ago

new_IT_videos In "instruction_data.py", there are some newly added image datasets in M3IT, and some newly added videos datasets. Is there any place to find those video datasets? Thanks!

Andy1621 commented 3 days ago

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

LiJiaqi96 commented 3 days ago

Thanks for your sharing!

LiJiaqi96 commented 1 day ago

Another question, how could I obtain the checkpoint after VideoChat2_HD training? in "demo_mistral_hd.ipynb".
state_dict = torch.load("your_model_path/videochat2/videochat2_hd_mistral_stage4.pth", "cpu") I noticed that there are several files in the "ckpt_latest.pth" folder, should I choose one of them?
Thanks!

LiJiaqi96 commented 1 day ago

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, could you please help me find the instruction json files such as f"{anno_root_it}/video/caption/sharegptvideo/train_300k.json", I did not find the json files in the HF VideoChat2-IT repo.

Andy1621 commented 9 hours ago

Sorry for the late reply. For the checkpoint, you need to use the file named mp_xxx which saves weights. For the instruction data, I will upload it today.

Andy1621 commented 5 hours ago

@LiJiaqi96 Please check the data in HuggingFace~