PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

Text input length #5

Closed zhaoshitian closed 7 months ago

zhaoshitian commented 7 months ago

May I ask what is the max input length of the text encoder?

LinB203 commented 7 months ago

The text is tokenized into a sequence of tokens, and CLIP limits the maximum tokens to 77, which is the max input length. Tokens after the 77th token will be truncated.

abhimanyu891998 commented 6 months ago

Hi @LinB203 , what if I wanted to semantically compare a text query with another text. Even then, would the max token length be 77? I reckon we won't be using CLIP to get just the text embeddings right?