NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
973 stars 68 forks source link

About perception testset #49

Open mary-0830 opened 2 months ago

mary-0830 commented 2 months ago

Hello authors, Thanks for sharing fantastic jobs. Now I would like to ask where this dataset came from, can you share a link or data? "/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/video_datasets_v2/perception_test/"

Efficient-Large-Language-Model commented 2 months ago

https://github.com/google-deepmind/perception_test

Specifically https://storage.googleapis.com/dm-perception-test/zip_data/valid_videos.zip

mary-0830 commented 2 months ago

https://github.com/google-deepmind/perception_test

Specifically https://storage.googleapis.com/dm-perception-test/zip_data/valid_videos.zip

Thank you for your quick reply! I have two questions I would like to ask.

  1. Does the perception test not require gpt assistance for evaluation?
  2. Why is the input in the model_vqa_videoperception.py different from other vqa inference evaluations? def get_model_option returns loss?
Efficient-Large-Language-Model commented 2 months ago
  1. yeah, it does not require gpt assistance
  2. we followed the official repo to implemement the evaluation, you can refer to the offical repo