Question about Caption Model

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Apache License 2.0

2.05k stars 139 forks source link

Hello,

Thanks for you great work! I am trying to caption some videos using your caption model, THUDM/cogvlm2-llama3-caption. However, I meet the following warnings:

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

I can revise the max_position_embeddings parameter in THUDM/cogvlm2-llama3-caption/config.json to change the predefined maximum length.

However, I am not sure whether directly changing the max_position_embeddings parameter will degrade the performance.

Thanks for your time.

THUDM / CogVLM2

Question about Caption Model #195