issues
search
dvlab-research
/
LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Apache License 2.0
618
stars
39
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
unable to get results when evaluating on msvd-qa benchmark
#96
irisgong1020
opened
18 hours ago
0
AssertionError: Size mismatch! image_features: 1, prompts: 8
#95
szbcasia
opened
1 month ago
0
I was reasoning on the GPU L20(48GB) machine and still burst the video memory
#94
try2020-code
opened
1 month ago
0
OOM in stage2 finetuning
#93
Nastu-Ho
opened
1 month ago
0
_StoreAction.__init__() got an unexpected keyword argument 'defalut'
#92
try2020-code
opened
1 month ago
1
2 tokens in inference
#91
XinyuJiang
closed
1 month ago
1
About mm_projector loading issue
#90
rubylan
opened
1 month ago
1
[h264 @ 0x871b380] mmco: unref short failure during stage-2 training
#89
Nastu-Ho
opened
1 month ago
0
training loss in stage-1
#88
Nastu-Ho
opened
2 months ago
0
code details
#87
Nastu-Ho
closed
2 months ago
0
Extract context relevancy
#86
IgnacioSan22
opened
2 months ago
0
KeyError: 'LlavaConfig'
#85
skyol99
opened
2 months ago
0
How to resume the checkpoint to continue pretraining?
#84
Einstone-rose
opened
2 months ago
0
About the WebVid dataset
#83
szbcasia
opened
2 months ago
0
Are all video-based checkpoints trained with 2 tokens?
#82
haodi19
opened
2 months ago
0
HF model format : vlm weights not in llama-vid-7b-full-336
#81
nileshkokane01
opened
2 months ago
0
Questions about Text Decoder and Text Query
#80
SeuXiao
opened
3 months ago
0
About the json in stage2 and stage3
#79
liziming5353
opened
3 months ago
1
about the context length for long video
#78
zhuqiangLu
opened
3 months ago
0
Confusion in pre-process images for long video
#77
zhuqiangLu
closed
2 months ago
0
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
#76
daocodedao
opened
3 months ago
2
About ZERO3
#75
xxtars
closed
3 months ago
7
An error occurs during the stage 2 fine-tuning
#74
ShuoZhang2003
opened
3 months ago
1
AttributeError: 'NoneType' object has no attribute 'is_loaded'
#73
sykuann
opened
3 months ago
1
why not use LoRA for tunning Vicuna?
#72
dragen1860
closed
2 months ago
1
Multi-image inference
#71
g-h-chen
opened
3 months ago
1
Computation costs for each stage?
#70
Becomebright
closed
3 months ago
1
Requirements needed for inferring llama-vid llama-vid-13b-full-224-video-fps-1
#69
sykuann
opened
3 months ago
1
abnormal outputs for llama-vid-7b-full-224-video-fps-1 ckpt
#68
YulongBonjour
opened
3 months ago
1
How to change default path for model_zoo
#67
sykuann
opened
3 months ago
2
Questions about the subtitles.
#66
Yxxxb
opened
4 months ago
1
flash-attn
#65
ismailukman
closed
4 months ago
2
error: llava key
#64
menahem-borges-rodrigues
closed
4 months ago
1
About evaluation on vqav2 dataset
#63
liziming5353
opened
4 months ago
1
Long video dataset (only available 167 movies)
#62
KerolosAtef
closed
3 months ago
2
Long Video dataset
#61
eslambakr
opened
4 months ago
1
Zero-3 offload support
#60
XenonLamb
opened
4 months ago
5
Sharing training loss
#59
Deaddawn
opened
5 months ago
2
MSVD ACC decrease after stage3
#58
Deaddawn
closed
4 months ago
3
The GPU's graphics card usage is also constantly increasing,
#57
kunkunsheng
closed
5 months ago
3
is eva_vit_g.pth trained by yourself?
#56
Deaddawn
closed
5 months ago
1
why stage 1 and 2 use differenct ` --version plain_guided ` ` --version imgsp_v1 ` parameters?
#55
dragen1860
closed
5 months ago
1
自定义长视频完全跑不了
#54
TotoroDHL
closed
4 months ago
1
Enquiry on Download Permission
#53
HenryHZY
closed
5 months ago
2
Incomplete evaluation on MSVD-QA dataset.
#52
XenonLamb
opened
5 months ago
5
About text encoder
#51
liziming5353
closed
4 months ago
3
Logic error in code: img_in_text and img_token not in sentence["value"]
#50
dragen1860
closed
5 months ago
3
is the LLM weight trainable during stage1-2-3?
#49
dragen1860
closed
5 months ago
1
Long Video CLI wrong
#48
QiSu77
closed
5 months ago
2
why delay_load in build_vision_tower(config, delay_load=True)?
#47
dragen1860
closed
5 months ago
1
Next