bytedance / tarsier

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
Apache License 2.0
146 stars 8 forks source link

GPU requirements for inference? #11

Open RaiaN opened 4 weeks ago

RaiaN commented 4 weeks ago

Hey, thanks for open-sourcing the model and weights.

What GPU do I need to perform inference when using Tarsier-34b?

jwwang424 commented 3 weeks ago

One A100 with 80GB Memory