bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
https://mingfei.info/shot2story
98 stars 6 forks source link

I want to run demo_video.py ,how much computing power is needed to run this model? #9

Closed lujuncong2000 closed 4 months ago

lujuncong2000 commented 5 months ago

Thanks for your job!My 3090Ti cant support this model, so I want to rent a online server,how much computing power is needed to run this model?

youthHan commented 5 months ago

Hi thank you for your interest.

  1. Do you mean that you fail to do inference or do training?
  2. We use A100-80G to do the experiments, but it may not be necessary. As for the capable device, let me test to see the memory consumption.
lujuncong2000 commented 5 months ago

Thanks for your reply. 1.Yeah.But I think the reason is that my computing power is low. 2.Okay.Thank you so much.

lujuncong2000 commented 5 months ago

I'm sorry to bother you again. I want to know how much memory you consume. The 80G server is too expensive. I dont't know if the 40G server can meet the requirements.

youthHan commented 5 months ago

Hi sorry for the late. For inference, 40G is sufficient.

For training, if you run the original setup for video summarization, which consume at most 32 frames in a single batch + ASR texts, 40G is not sufficient. You may adjust the #frames per shot or remove the ASR.

If you run video shot captioning, I think 40G is okay to run and you may need to lower the batch size.

On Fri, 24 May 2024 at 5:57 AM, Briefness @.***> wrote:

I'm sorry to bother you again. I want to know how much memory you consume. The 80G server is too expensive. I dont't know if the 40G server can meet the requirements.

— Reply to this email directly, view it on GitHub https://github.com/bytedance/Shot2Story/issues/9#issuecomment-2128347502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOOMCZADVDSPGPVRK45CWTZD2NA7AVCNFSM6AAAAABIDHIWUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRYGM2DONJQGI . You are receiving this because you commented.Message ID: @.***>

fazliimam commented 5 months ago

For multishot if we reduce the framing rate, would a single 40gb be sufficient for training?

youthHan commented 5 months ago

@fazlicodes @lujuncong2000 Hi I tested on my A100-80G

  1. If no tricks, 16 frames sometimes exceed 40GB (up to 44GB), which I think may be due to the PyTorch memory management takes more unused memory as cache
  2. If set torch.cuda.empty_cache() in lavis/models/blip2_models/video_minigpt4.py forward function to avoid memory peaks, it works well with memory usage at most 38656MiB.

So I think using A100-40GB may be fine with either of the above two approaches.

lujuncong2000 commented 5 months ago

Thank you very much

发自我的iPhone

------------------ Original ------------------ From: Mingfei Han @.> Date: Wed,Jun 12,2024 3:14 PM To: bytedance/Shot2Story @.> Cc: Briefness @.>, Mention @.> Subject: Re: [bytedance/Shot2Story] I want to run demo_video.py ,how muchcomputing power is needed to run this model? (Issue #9)

@fazlicodes @lujuncong2000 Hi I tested on my A100-80G

If no tricks, 16 frames sometimes exceed 40GB (up to 44GB), which I think may be due to the PyTorch memory management takes more unused memory as cache

If set torch.cuda.empty_cache() in lavis/models/blip2_models/video_minigpt4.py forward function to avoid memory peaks, it works well with memory usage at most 38656MiB.

So I think using A100-40GB may be fine with either of the above two approaches.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

youthHan commented 4 months ago

Hi due to long-time inactivity, I will close this issue. Please feel free to open if you still have questions.