DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

[QUESTION] vicuna-7b model specification on HuggingFace Space demo #113

Closed comidan closed 1 year ago

comidan commented 1 year ago

Hello, amazing work! I was trying to run it locally, just wanted to know if it would be possible to know which vicuna-7b model you specifically used to run the demo on HuggingFace, or if you have a fine-tuned version not avaialble to be used.

I saw this on HuggingFace where it seems it would be possible to use that vicuna-7b model: https://huggingface.co/spaces/DAMO-NLP-SG/Video-LLaMA/discussions/4

Just asking because it works pretty well compared to other here proposed language models for the video-llama.

Thank you!

lixin4ever commented 1 year ago

We use vicuna-7b-v0 (the earliest version) instead of vicuna-7b-v1.x, download links are given at Prerequisites.