farewellthree / PPLLaVA

Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"
Apache License 2.0
84 stars 3 forks source link

Question about Activitynet dataset preparation #3

Closed Parsifal133 closed 5 hours ago

Parsifal133 commented 22 hours ago

Hi! Thanks for the great work and open source code.

I noticed that Activitynet's video frame data is used for inference, as shown in the code below.

frame='/Path/to/video_chatgpt/activitynet_frames.json'

It involves a json file about the frames. I don't know how to find this file.

I checked the relevant git repos, such as St-llm, video-chatgpt, and activitynet-qa, but I didn't find relevant information. Can you please share how to get this file and how to use Activitynet for model evaluation?

farewellthree commented 21 hours ago

Hello. Apologies for the confusion, the activitynet_frames.json is actually not necessary, and we have corrected the code. Additionally, our ActivityNet dataset consists of pre-extracted raw frames. If your ActivityNet is in raw video format, you can modify the code similarly to how it is done for MSR-VTT or MSVD.

Parsifal133 commented 5 hours ago

Thanks for your reply! I can now evaluate the model on Activitynet dataset, based on the latest code you provided.