FreddeFrallan / Multilingual-CLIP

OpenAI CLIP text encoders for multiple languages!
MIT License
746 stars 69 forks source link

Issue Report: Token indices sequence length is longer than the specified maximum sequence length for this model #37

Open jiaxin-zhang opened 3 months ago

jiaxin-zhang commented 3 months ago

Firstly, thank you for the incredible work on the Multilingual-CLIP model. We have been using it and it is great!

However, we've encountered an issue when input text queries exceed 512 tokens. Here is error message:

"Token indices sequence length is longer than the specified maximum sequence length for this model (514 > 512). Running this sequence through the model will result in indexing errors."

I wonder if you've considered passing truncation=True in the tokenizer, MultilingualCLIP forward method line 16 here. This change would fix the issue when the text query exceeds the token limit. Thanks!