RifleZhang / LLaVA-Hound-DPO

121 stars 18 forks source link

The videos where the frames extracted from #10

Closed dragonlzm closed 2 months ago

dragonlzm commented 3 months ago

Hi! Will you consider releasing the mapping from the annotations to the source videos from which you extracted the training video frame? Thanks!

RifleZhang commented 3 months ago

Thanks for reaching out. You can use the id field in our annotated frames to directly map to the original video. Here is the detail:

The source is from WebVid (https://github.com/m-bain/webvid), Youtube shorts (https://github.com/PKU-YuanGroup/LanguageBind/blob/main/DATASETS.md), and activitynet (http://activity-net.org/).

As for the names, the ones with scene (v_XNTy5ZTMqVU-Scene-011) is from ActivityNet, the pure number (6810) is from Webvid, and the other ('I8q-Y8VsGek') is from vidal. You can use the name to match the ones in the original datasets.