DAMO-NLP-SG VideoLLaMA2 issues

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Apache License 2.0

907 stars 60 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Inference error for finetuned models.

#126 Danielement321 closed 5 hours ago
2
Can VideoLLaMA2.1-7B-AV perform inference on images?

#125 sjghh closed 4 hours ago
2
Finetune model inference error

#124 thisurawz1 opened 6 days ago
3
output strangeness

#123 babyta closed 4 hours ago
3
An error occurred while loading video.json and audio.json

#122 sjghh closed 1 week ago
7
System Message Not Affecting VideoLLaMA2-7B's Responses

#121 yuripetralia closed 1 week ago
4
Error When Running Multiple-model Version Demo

#120 shmooel28 opened 3 weeks ago
1
The base model used from hugging face for Audio Visual question answering is not at all working

#119 asmit203 closed 1 week ago
11
vision_tower load error?how to correctly load ckpt?

#118 Cece1031 opened 4 weeks ago
2
No module named 'transformers'

#117 0sATs0 opened 4 weeks ago
2
🚀 [Release Notes] 2024.10

#116 lixin4ever opened 1 month ago
0
Change the __init__.py file to use the relevant pretrained function

#115 marvlyngkhoi closed 1 week ago
1
Inference code does not work for videos

#114 marvlyngkhoi opened 1 month ago
3
How to load model model that was finetuned using qlora or lora?

#113 marvlyngkhoi opened 1 month ago
1
Can't hear the audio

#112 sjghh opened 1 month ago
9
🔧 [Refactor] Update build backend

#111 clownrat6 closed 1 month ago
0
[Feat] Supporting audio and audio-visual stages.

#110 xinyifei99 closed 1 month ago
0
AV ckpt inference error

#109 kk94wang closed 1 month ago
2
Update README.md

#108 xinyifei99 closed 1 month ago
0
You are using a model of type mistral to instantiate a model of type videollama2_mistral. This is not supported for all configurations of models and can yield errors.

#107 hufflepuff0596 closed 1 month ago
1
QLoRA fin-tunes a custom model with 4-bits, and inference the video, then we got :

#106 BUAACY opened 1 month ago
1
Forward pass of the model - how to pass videos?

#105 esh04 opened 1 month ago
1
What are the GPUs used in the fine-tuning stage？

#104 BUAACY closed 1 month ago
1
Demo Question

#103 sjghh closed 1 month ago
4
audio information

#102 sjghh closed 1 month ago
1
Request for Inference Code on Custom Datasets

#101 dongqi-me opened 1 month ago
6
Json files of the MVSD-QA dataset

#100 Hou9612 closed 1 month ago
2
When will the audio branch be released?

#99 XuecWu closed 1 month ago
3
⭐ [Feat] Supporting audio and audio-visual stages.

#98 xinyifei99 closed 1 month ago
1
code for batch inference

#97 zhangjic22 opened 2 months ago
1
videollama2_av

#96 xinyifei99 closed 2 months ago
0
Problem: Segmentation fault (core dumped)

#95 CamellIyquitous closed 1 month ago
5
Can I run VideoLLaMA 1 in this repo?

#94 jun297 opened 2 months ago
1
Can videollama2 continue finetuning on my own dataset using 32 frames?

#93 zhengrongz closed 2 months ago
2
VideoLLaMA2 performance gap on video benchmarks

#92 zhuqiangLu closed 1 month ago
1
Videochatgpt_gen link for Test_Human_Annotated_Captions is not valid

#91 jun297 opened 2 months ago
2
ValueError: The following `model_kwargs` are not used by the model: ['images_or_videos', 'modal_list']

#90 CaffeyChen opened 2 months ago
2
After fine-tuning, the model outputs repetitive phrases

#89 Jackyzjz opened 2 months ago
4
🔧 [Refactor] Make codebase reproducible for previous version.

#88 clownrat6 closed 2 months ago
0
Could you please advise when the checkpoint for the audio branch will be made public?

#87 ymxyll opened 2 months ago
6
Deployment on huggingface endpoints

#86 aliayub40995 opened 2 months ago
2
What is the difference between the 'base' and 'chat' versions of a model type?

#85 Lanbai-eleven closed 2 months ago
2
Can we do the only text, image and text and video and text finetuning with lora in a one run

#84 thisurawz1 closed 2 months ago
4
how to do the inference with the finetune weights / model

#83 thisurawz1 closed 1 month ago
12
UnboundLocalError: local variable "video_path" referenced before assigment

#82 acDante closed 4 weeks ago
4
Cannot reproduce results on vllava datasets

#81 williamium3000 opened 3 months ago
27
Model keeps output "there is no sound/ I can not hear anything" when there is actual sound

#80 qixueweigitbub closed 4 weeks ago
3
train and fine tune for audio-video

#79 trahman8 closed 4 weeks ago
3
Unable to load *ANY BASE MODEL* in 4bit

#78 ApoorvFrontera opened 3 months ago
2
Error while loading Mixtral based SFT MoE model VideoLLaMA2-8x7B: SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

#77 ApoorvFrontera opened 3 months ago
5