VectorSpaceLab / Video-XL

🔥🔥First-ever hour scale video understanding models
Apache License 2.0
178 stars 11 forks source link

Some badcase. #9

Open ApolloRay opened 1 month ago

ApolloRay commented 1 month ago

I tried to use models to infer some cases, but found that the model's handling of details is not very good. For example, for this link http, my prompt is “When was the goal scored in the game and provide a specific match time“, but the output result is "The goal was scored towards the end of the match, specifically at a match time of 86.42". Thank you for using clip for encoding, the effect loss is still quite significant.

shuyansy commented 1 month ago

I acknowledge the current version of VideoXL still holds limited capacity in some domains. As for the case you provided, it is weak in video text recognition and sports event spotting. We will add more data to improve its ability in the future.