Closed zhaoshitian closed 7 months ago
The text is tokenized into a sequence of tokens, and CLIP limits the maximum tokens to 77, which is the max input length. Tokens after the 77th token will be truncated.
Hi @LinB203 , what if I wanted to semantically compare a text query with another text. Even then, would the max token length be 77? I reckon we won't be using CLIP to get just the text embeddings right?
May I ask what is the max input length of the text encoder?