issues
search
PKU-YuanGroup
/
LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
723
stars
52
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AttributeError: 'NoneType' object has no attribute 'astype' in Depth processor
#68
SoyeonHH
closed
3 days ago
1
ValueError: Input image size (112*1036) doesn't match model ([112, 1036]*[112, 1036]).
#67
JeffRody
opened
2 weeks ago
1
Does the model work in scenarios with missing modalities?
#66
naajeehxe
opened
3 weeks ago
2
embedding arithmetic
#65
bakachan19
opened
1 month ago
0
How to calculate similarity of Video to audio?
#64
Coooderr
opened
1 month ago
0
token masking and contrastive learning
#63
ooochen-30
opened
1 month ago
0
Can not find datasets for LanguageBind_Image?
#62
superwood
opened
2 months ago
0
where is the code of LanguageBind_Image and how to train it?
#61
superwood
opened
2 months ago
0
Embedding similarity
#60
akBear23
opened
3 months ago
0
Any support for languages other than English?
#59
ragesh2000
opened
3 months ago
0
Method of running evaluation on MSR-VTT dataset
#58
sartaki
opened
3 months ago
0
关于数据集的一些问题
#57
XiaoZong0
closed
4 months ago
1
Video-Language Pre-training hours
#56
msw6468
opened
5 months ago
0
Are some of these models interchangeable?
#55
felmoreno1726
opened
5 months ago
0
Pretraining on video dataset without lora.
#54
shihuai
opened
5 months ago
0
Any plans to use Long-CLIP to extend text input token limit?
#53
lennartmoritz
opened
6 months ago
0
NameError: name 'get_audio_anno' is not defined
#52
noah003
opened
6 months ago
0
Non-reproducible MSRVTT results - I get R@1 accuracy less than 1%
#51
lennartmoritz
opened
7 months ago
2
Clarification questions about the framework
#50
felmoreno1726
opened
7 months ago
4
关于视频文本的训练问题
#49
Tunanzzz
closed
7 months ago
0
How to load pt model trained according to Training LanguageBind step?
#48
haochange
opened
7 months ago
1
gpu资源
#47
letaozhang
opened
7 months ago
1
where is LanguageBind_Image
#46
hd201708010401
opened
8 months ago
2
Inconsistent running results of inference.py
#45
Jade999
closed
8 months ago
5
confusion about VIDAL-10M video-text data
#44
wli333
opened
8 months ago
0
Create depth_ddp_glpn.py
#43
BinZhu-ece
opened
8 months ago
0
Fine-tuneing LLM + LanguageBind?
#42
Crystalxd
opened
8 months ago
1
Inquiry on Unimodal Fine-Tuning with Locked Image in LanguageBind
#41
hexinyi2101
closed
6 months ago
0
The length of text that the text encoder can handle
#40
song-wensong
opened
8 months ago
1
VIT-H model on other modality [Audio/Depth/Thermal]
#39
tikboaHIT
opened
9 months ago
1
Combination of multiple modalities
#38
anthony-mendil
opened
9 months ago
7
Use of undefined functions during fine_tune with custom audio data
#37
okaybody10
closed
9 months ago
1
Audio-Language Alignment data for reproduction
#36
memoiry
opened
9 months ago
1
finetuning on a classification task
#35
Sravanthgithub
opened
10 months ago
0
Vision encoder version
#34
JosephPai
closed
10 months ago
1
Congrats on Acceptance !!!
#33
SenmiaoORZ
opened
10 months ago
1
What is the training configurations for full tuning?
#32
StanLei52
closed
10 months ago
4
batch inference
#31
doyikim1
opened
10 months ago
0
how to load LanguageBind/LanguageBind_Video_Huge_V1.5_FT model
#30
valencebond
closed
10 months ago
1
Can you share the NYU-D dataset you used for evaluation, e.g. how to split the dataset?
#29
bf-yang
closed
10 months ago
2
Why don't to share the parameters backbone between Image and Video?
#28
SCZwangxiao
closed
10 months ago
1
视频特征的提取支持动态帧数吗,效果相对于8帧会有下降或者变差吗
#27
1093842024
closed
10 months ago
1
What's the difference between LanguageBind and LLaVA-1.5
#26
OPilgrim
closed
10 months ago
2
How to Initialize the multi-modal encoders & training from scratch
#25
chen-yy20
closed
11 months ago
1
where is the LanguageBind_Audio_FT in huggingface?
#24
kou35
closed
11 months ago
1
about LanguageBind_Video_merge
#23
kou35
closed
11 months ago
1
VIT-H model release
#22
tikboaHIT
closed
11 months ago
2
Hashtags and prompts?
#21
Kamino666
closed
11 months ago
4
用于特征提取对齐,选用输出为什么参数
#20
xiaohaochen0308
closed
11 months ago
1
Add flash attention 2
#19
pphuc25
closed
11 months ago
7
Next