PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

how to use hugging face model #11

Closed carry-xz closed 6 months ago

carry-xz commented 6 months ago

nice work !An error occurred while trying to load the model using the huggingface api `from transformers import AutoProcessor, AutoModel, AutoTokenizer

processor = AutoProcessor.from_pretrained("LanguageBind/LanguageBind_Video") model = AutoModel.from_pretrained("LanguageBind/LanguageBind_Video") tokenizer = AutoTokenizer.from_pretrained("LanguageBind/LanguageBind_Video")`

KeyError: 'LanguageBindVideo'

Could you give an example of using huggingface transformers input video to extract features

LinB203 commented 6 months ago

Thanks for your attention. Please refer here