Open song-wensong opened 8 months ago
import torch from languagebind import LanguageBindVideo, LanguageBindVideoTokenizer, LanguageBindVideoProcessor pretrained_ckpt = 'LanguageBind/LanguageBind_Video_FT' # also 'LanguageBind/LanguageBind_Video' model = LanguageBindVideo.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir') tokenizer = LanguageBindVideoTokenizer.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir') video_process = LanguageBindVideoProcessor(model.config, tokenizer) model.eval() data = video_process(["your/video.mp4"], ['your text.'], return_tensors='pt') with torch.no_grad(): out = model(**data) print(out.text_embeds @ out.image_embeds.T)
In this code, what is the maximum length of your text? If it exceeds 77, will it be truncated directly?
your text
Maximum text tokens is 77. If it exceeds 77, it will be truncated directly. This method we just follow CLIP.
In this code, what is the maximum length of
your text
? If it exceeds 77, will it be truncated directly?