2U1 / Phi3-Vision-Finetune

An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.
Apache License 2.0
62 stars 8 forks source link

Timeout during finetuning #26

Open fangruizhu opened 1 week ago

fangruizhu commented 1 week ago

Hi,

Thanks for sharing the code. I'm using it to fine-tune on videos by freezing the visual encoder and projector, and tuning the LLM. Initially, everything works well, but as training progresses, I notice that GPU memory usage keeps increasing. I'm using 8 H100s, but eventually, the process times out due to running out of memory. Have you encountered this issue before? Any insights you might have would be greatly appreciated. Thank you!

2U1 commented 1 week ago

I haven't tested the videos with a large dataset. So I hanven't encountered the problem you've said. When using large dataset with image dataset, it doesn't happen so it looks like some kind of video preprocessing problem. I'll look look into it and let you know when I get it.

Thanks for the issue.

Also does the memory run out when the training are in the middle of the process? Does it looks like a memory leak?

fangruizhu commented 1 week ago

Thank you for the reply! Yes, the memory only runs out in the middle of the training. At the beginning it was always fine. I set bs=8 per gpu, grad accum=1 or 2. I use Valley dataset, containing 702K video data. Training one epoch, it got time out around 50% -- 80% training iterations, with increasing memory usage on GPU. I use deepspeed zero3.

2U1 commented 1 week ago

Can You see if the resolution of the each video is different? If it's the same, adding del vr right before the return state in encode_video in data.py might help. I'm not really sure what is the problem.

fangruizhu commented 1 week ago

Let me have a try! I will get back to you later, thanks!

fangruizhu commented 1 week ago

I tried del vr, and also I tried zero2.json and zero3.json. The training still hangs there. I am going to reinstall the env and try again.

2U1 commented 1 week ago

You can decrease the num_frames maybe. Also the 4 for the num_crops is the best hyperparameter in multi-image/video.