Thanks for you great work! I am trying to caption some videos using your caption model, THUDM/cogvlm2-llama3-caption. However, I meet the following warnings:
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
I can revise the max_position_embeddings parameter in THUDM/cogvlm2-llama3-caption/config.json to change the predefined maximum length.
However, I am not sure whether directly changing the max_position_embeddings parameter will degrade the performance.
Thanks for you great work again! I am also curious about three questions.
How many frames for one video should I input to get the best performance?
Considering the name max_position_embeddings, are the position_embeddings sine embeddings or learned embeddings?
If I change the max_position_embeddings from 2048 to 4096, is there any interpolation operation to obtain 4096 embeddings from predefined 2048 embeddings?
Hello,
Thanks for you great work! I am trying to caption some videos using your caption model,
THUDM/cogvlm2-llama3-caption
. However, I meet the following warnings:This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
I can revise the
max_position_embeddings
parameter inTHUDM/cogvlm2-llama3-caption/config.json
to change the predefined maximum length.However, I am not sure whether directly changing the
max_position_embeddings
parameter will degrade the performance.Thanks for your time.