issues
search
DAMO-NLP-SG
/
VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
907
stars
60
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Inference error for finetuned models.
#126
Danielement321
closed
5 hours ago
2
Can VideoLLaMA2.1-7B-AV perform inference on images?
#125
sjghh
closed
4 hours ago
2
Finetune model inference error
#124
thisurawz1
opened
6 days ago
3
output strangeness
#123
babyta
closed
4 hours ago
3
An error occurred while loading video.json and audio.json
#122
sjghh
closed
1 week ago
7
System Message Not Affecting VideoLLaMA2-7B's Responses
#121
yuripetralia
closed
1 week ago
4
Error When Running Multiple-model Version Demo
#120
shmooel28
opened
3 weeks ago
1
The base model used from hugging face for Audio Visual question answering is not at all working
#119
asmit203
closed
1 week ago
11
vision_tower load error?how to correctly load ckpt?
#118
Cece1031
opened
4 weeks ago
2
No module named 'transformers'
#117
0sATs0
opened
4 weeks ago
2
🚀 [Release Notes] 2024.10
#116
lixin4ever
opened
1 month ago
0
Change the __init__.py file to use the relevant pretrained function
#115
marvlyngkhoi
closed
1 week ago
1
Inference code does not work for videos
#114
marvlyngkhoi
opened
1 month ago
3
How to load model model that was finetuned using qlora or lora?
#113
marvlyngkhoi
opened
1 month ago
1
Can't hear the audio
#112
sjghh
opened
1 month ago
9
🔧 [Refactor] Update build backend
#111
clownrat6
closed
1 month ago
0
[Feat] Supporting audio and audio-visual stages.
#110
xinyifei99
closed
1 month ago
0
AV ckpt inference error
#109
kk94wang
closed
1 month ago
2
Update README.md
#108
xinyifei99
closed
1 month ago
0
You are using a model of type mistral to instantiate a model of type videollama2_mistral. This is not supported for all configurations of models and can yield errors.
#107
hufflepuff0596
closed
1 month ago
1
QLoRA fin-tunes a custom model with 4-bits, and inference the video, then we got :
#106
BUAACY
opened
1 month ago
1
Forward pass of the model - how to pass videos?
#105
esh04
opened
1 month ago
1
What are the GPUs used in the fine-tuning stage?
#104
BUAACY
closed
1 month ago
1
Demo Question
#103
sjghh
closed
1 month ago
4
audio information
#102
sjghh
closed
1 month ago
1
Request for Inference Code on Custom Datasets
#101
dongqi-me
opened
1 month ago
6
Json files of the MVSD-QA dataset
#100
Hou9612
closed
1 month ago
2
When will the audio branch be released?
#99
XuecWu
closed
1 month ago
3
⭐ [Feat] Supporting audio and audio-visual stages.
#98
xinyifei99
closed
1 month ago
1
code for batch inference
#97
zhangjic22
opened
2 months ago
1
videollama2_av
#96
xinyifei99
closed
2 months ago
0
Problem: Segmentation fault (core dumped)
#95
CamellIyquitous
closed
1 month ago
5
Can I run VideoLLaMA 1 in this repo?
#94
jun297
opened
2 months ago
1
Can videollama2 continue finetuning on my own dataset using 32 frames?
#93
zhengrongz
closed
2 months ago
2
VideoLLaMA2 performance gap on video benchmarks
#92
zhuqiangLu
closed
1 month ago
1
Videochatgpt_gen link for Test_Human_Annotated_Captions is not valid
#91
jun297
opened
2 months ago
2
ValueError: The following `model_kwargs` are not used by the model: ['images_or_videos', 'modal_list']
#90
CaffeyChen
opened
2 months ago
2
After fine-tuning, the model outputs repetitive phrases
#89
Jackyzjz
opened
2 months ago
4
🔧 [Refactor] Make codebase reproducible for previous version.
#88
clownrat6
closed
2 months ago
0
Could you please advise when the checkpoint for the audio branch will be made public?
#87
ymxyll
opened
2 months ago
6
Deployment on huggingface endpoints
#86
aliayub40995
opened
2 months ago
2
What is the difference between the 'base' and 'chat' versions of a model type?
#85
Lanbai-eleven
closed
2 months ago
2
Can we do the only text, image and text and video and text finetuning with lora in a one run
#84
thisurawz1
closed
2 months ago
4
how to do the inference with the finetune weights / model
#83
thisurawz1
closed
1 month ago
12
UnboundLocalError: local variable "video_path" referenced before assigment
#82
acDante
closed
4 weeks ago
4
Cannot reproduce results on vllava datasets
#81
williamium3000
opened
3 months ago
27
Model keeps output "there is no sound/ I can not hear anything" when there is actual sound
#80
qixueweigitbub
closed
4 weeks ago
3
train and fine tune for audio-video
#79
trahman8
closed
4 weeks ago
3
Unable to load *ANY BASE MODEL* in 4bit
#78
ApoorvFrontera
opened
3 months ago
2
Error while loading Mixtral based SFT MoE model VideoLLaMA2-8x7B: SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
#77
ApoorvFrontera
opened
3 months ago
5
Next