issues
search
PKU-YuanGroup
/
LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
536
stars
43
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
NameError: name 'get_audio_anno' is not defined
#52
noah003
opened
2 days ago
0
Non-reproducible MSRVTT results - I get R@1 accuracy less than 1%
#51
lennartmoritz
opened
1 week ago
2
Clarification questions about the framework
#50
felmoreno1726
opened
2 weeks ago
1
关于视频文本的训练问题
#49
Tunanzzz
closed
2 weeks ago
0
How to load pt model trained according to Training LanguageBind step?
#48
haochange
opened
4 weeks ago
1
gpu资源
#47
letaozhang
opened
1 month ago
1
where is LanguageBind_Image
#46
hd201708010401
opened
1 month ago
2
Inconsistent running results of inference.py
#45
Jade999
closed
1 month ago
5
confusion about VIDAL-10M video-text data
#44
wli333
opened
1 month ago
0
Create depth_ddp_glpn.py
#43
BinZhu-ece
opened
1 month ago
0
Fine-tuneing LLM + LanguageBind?
#42
Crystalxd
opened
2 months ago
0
Inquiry on Unimodal Fine-Tuning with Locked Image in LanguageBind
#41
hexinyi2101
opened
2 months ago
0
The length of text that the text encoder can handle
#40
song-wensong
opened
2 months ago
1
VIT-H model on other modality [Audio/Depth/Thermal]
#39
tikboaHIT
opened
2 months ago
1
Combination of multiple modalities
#38
anthony-mendil
opened
2 months ago
3
Use of undefined functions during fine_tune with custom audio data
#37
okaybody10
closed
2 months ago
1
Audio-Language Alignment data for reproduction
#36
memoiry
opened
3 months ago
1
finetuning on a classification task
#35
Sravanthgithub
opened
3 months ago
0
Vision encoder version
#34
JosephPai
closed
3 months ago
1
Congrats on Acceptance !!!
#33
SenmiaoORZ
opened
3 months ago
1
What is the training configurations for full tuning?
#32
StanLei52
closed
3 months ago
4
batch inference
#31
doyikim1
opened
3 months ago
0
how to load LanguageBind/LanguageBind_Video_Huge_V1.5_FT model
#30
valencebond
closed
3 months ago
1
Can you share the NYU-D dataset you used for evaluation, e.g. how to split the dataset?
#29
bf-yang
closed
4 months ago
2
Why don't to share the parameters backbone between Image and Video?
#28
SCZwangxiao
closed
3 months ago
1
视频特征的提取支持动态帧数吗,效果相对于8帧会有下降或者变差吗
#27
1093842024
closed
3 months ago
1
What's the difference between LanguageBind and LLaVA-1.5
#26
OPilgrim
closed
3 months ago
2
How to Initialize the multi-modal encoders & training from scratch
#25
chen-yy20
closed
4 months ago
1
where is the LanguageBind_Audio_FT in huggingface?
#24
kou35
closed
4 months ago
1
about LanguageBind_Video_merge
#23
kou35
closed
5 months ago
1
VIT-H model release
#22
tikboaHIT
closed
5 months ago
2
Hashtags and prompts?
#21
Kamino666
closed
5 months ago
4
用于特征提取对齐,选用输出为什么参数
#20
huainanchen
closed
5 months ago
1
Add flash attention 2
#19
pphuc25
closed
5 months ago
7
Can I change embeddings['image'].shape from 768 to 1024?
#18
dongfeicui
closed
5 months ago
1
About download weights
#17
dongfeicui
closed
5 months ago
1
Add files via upload
#16
JessyTsu1
closed
5 months ago
0
Add files via upload
#15
JessyTsu1
closed
5 months ago
0
provide a sample data for training
#14
pphuc25
closed
5 months ago
2
cannot run the code train
#13
pphuc25
closed
5 months ago
1
pretraining details
#12
xiaoen0
closed
5 months ago
1
how to use hugging face model
#11
carry-xz
closed
6 months ago
1
Choice of Vit-L over Vit-H
#10
jacklishufan
closed
6 months ago
2
When will you release the dataset?
#9
xiangchen-Z
closed
6 months ago
3
research about a model video captioning
#8
pphuc25
closed
6 months ago
1
docs: add if cuda available
#7
pphuc25
closed
6 months ago
1
bug in install requirements.txt
#6
pphuc25
closed
6 months ago
1
Text input length
#5
zhaoshitian
closed
6 months ago
2
GPU sources
#4
xiaoaoran
closed
6 months ago
2
Seeing excessive GPU memory usage during inference
#3
abhimanyu891998
closed
6 months ago
2
Next