ShareGPT4Omni / ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
https://sharegpt4video.github.io/
1.22k stars 43 forks source link

What are the principles behind the selection of baseline models? #16

Closed ShiYaya closed 2 months ago

ShiYaya commented 3 months ago

What are the principles behind the selection of baseline models? The ShareCaptioner-Video model is trained using the IXC2-4KHD dataset, while the ShareGPT4Video-8B model is trained on the LLaVA-Next-8B dataset. What is the rationale behind this choice?

xiaoachen98 commented 2 months ago

What are the principles behind the selection of baseline models? The ShareCaptioner-Video model is trained using the IXC2-4KHD dataset, while the ShareGPT4Video-8B model is trained on the LLaVA-Next-8B dataset. What is the rationale behind this choice?

We use LLaVA-Next-8B for our ShareGPT4Video-8B for easy reproduction. We choose InternLM-XComposer2-4KHD which can handle a wide range of resolutions and aspect ratios of images to perform a general captioner for various videos.