PKU-YuanGroup LanguageBind issues

PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

https://arxiv.org/abs/2310.01852

MIT License

723 stars 52 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

AttributeError: 'NoneType' object has no attribute 'astype' in Depth processor

#68 SoyeonHH closed 3 days ago
1
ValueError: Input image size (112*1036) doesn't match model ([112, 1036]*[112, 1036]).

#67 JeffRody opened 2 weeks ago
1
Does the model work in scenarios with missing modalities?

#66 naajeehxe opened 3 weeks ago
2
embedding arithmetic

#65 bakachan19 opened 1 month ago
0
How to calculate similarity of Video to audio?

#64 Coooderr opened 1 month ago
0
token masking and contrastive learning

#63 ooochen-30 opened 1 month ago
0
Can not find datasets for LanguageBind_Image？

#62 superwood opened 2 months ago
0
where is the code of LanguageBind_Image and how to train it?

#61 superwood opened 2 months ago
0
Embedding similarity

#60 akBear23 opened 3 months ago
0
Any support for languages other than English?

#59 ragesh2000 opened 3 months ago
0
Method of running evaluation on MSR-VTT dataset

#58 sartaki opened 3 months ago
0
关于数据集的一些问题

#57 XiaoZong0 closed 4 months ago
1
Video-Language Pre-training hours

#56 msw6468 opened 5 months ago
0
Are some of these models interchangeable?

#55 felmoreno1726 opened 5 months ago
0
Pretraining on video dataset without lora.

#54 shihuai opened 5 months ago
0
Any plans to use Long-CLIP to extend text input token limit?

#53 lennartmoritz opened 6 months ago
0
NameError: name 'get_audio_anno' is not defined

#52 noah003 opened 6 months ago
0
Non-reproducible MSRVTT results - I get R@1 accuracy less than 1%

#51 lennartmoritz opened 7 months ago
2
Clarification questions about the framework

#50 felmoreno1726 opened 7 months ago
4
关于视频文本的训练问题

#49 Tunanzzz closed 7 months ago
0
How to load pt model trained according to Training LanguageBind step?

#48 haochange opened 7 months ago
1
gpu资源

#47 letaozhang opened 7 months ago
1
where is LanguageBind_Image

#46 hd201708010401 opened 8 months ago
2
Inconsistent running results of inference.py

#45 Jade999 closed 8 months ago
5
confusion about VIDAL-10M video-text data

#44 wli333 opened 8 months ago
0
Create depth_ddp_glpn.py

#43 BinZhu-ece opened 8 months ago
0
Fine-tuneing LLM + LanguageBind?

#42 Crystalxd opened 8 months ago
1
Inquiry on Unimodal Fine-Tuning with Locked Image in LanguageBind

#41 hexinyi2101 closed 6 months ago
0
The length of text that the text encoder can handle

#40 song-wensong opened 8 months ago
1
VIT-H model on other modality [Audio/Depth/Thermal]

#39 tikboaHIT opened 9 months ago
1
Combination of multiple modalities

#38 anthony-mendil opened 9 months ago
7
Use of undefined functions during fine_tune with custom audio data

#37 okaybody10 closed 9 months ago
1
Audio-Language Alignment data for reproduction

#36 memoiry opened 9 months ago
1
finetuning on a classification task

#35 Sravanthgithub opened 10 months ago
0
Vision encoder version

#34 JosephPai closed 10 months ago
1
Congrats on Acceptance !!!

#33 SenmiaoORZ opened 10 months ago
1
What is the training configurations for full tuning?

#32 StanLei52 closed 10 months ago
4
batch inference

#31 doyikim1 opened 10 months ago
0
how to load LanguageBind/LanguageBind_Video_Huge_V1.5_FT model

#30 valencebond closed 10 months ago
1
Can you share the NYU-D dataset you used for evaluation, e.g. how to split the dataset?

#29 bf-yang closed 10 months ago
2
Why don't to share the parameters backbone between Image and Video?

#28 SCZwangxiao closed 10 months ago
1
视频特征的提取支持动态帧数吗，效果相对于8帧会有下降或者变差吗

#27 1093842024 closed 10 months ago
1
What's the difference between LanguageBind and LLaVA-1.5

#26 OPilgrim closed 10 months ago
2
How to Initialize the multi-modal encoders & training from scratch

#25 chen-yy20 closed 11 months ago
1
where is the LanguageBind_Audio_FT in huggingface?

#24 kou35 closed 11 months ago
1
about LanguageBind_Video_merge

#23 kou35 closed 11 months ago
1
VIT-H model release

#22 tikboaHIT closed 11 months ago
2
Hashtags and prompts?

#21 Kamino666 closed 11 months ago
4
用于特征提取对齐，选用输出为什么参数

#20 xiaohaochen0308 closed 11 months ago
1
Add flash attention 2

#19 pphuc25 closed 11 months ago
7