Using InternLM to describe videos! - Githubissues

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Apache License 2.0

2.47k stars 153 forks source link

Using InternLM to describe videos! #222

Closed mvoodarla closed 5 months ago

mvoodarla commented 6 months ago

Hey folks! We just built a cost-effective, lightweight way to generate audiovisual summaries for videos.

Process videos up to 12x faster than realtime
Costs <$0.01 / min of video
Combines visual and audial components

The goal here is not to build a single E2E model but something that could actually be used in production while preserving relatively high quality.

You can try it out yourself here: https://www.sievedata.com/functions/sieve/describe

How we built it: https://www.sievedata.com/blog/describe-video-summary-beta-launch

The code: https://github.com/sieve-community/describe

https://github.com/InternLM/InternLM-XComposer/assets/11367688/286b0045-e2fe-43b5-90ae-9a9e35b840d6

yuhangzang commented 5 months ago

Thanks for sharing this project! Thanks for your contribution to the video captioning community!