Train, Valid and Test dataset may have quite different characteristics

ChenYi99 / EgoPlan

BSD 3-Clause "New" or "Revised" License

47 stars 5 forks source link

Train, Valid and Test dataset may have quite different characteristics #7

Closed yusuke-intern closed 1 month ago

yusuke-intern commented 1 month ago

I had a meta-analysis of each dataset and found interesting results.

The test dataset has a very short video length.
About 2 out of 4 actions/answers are taken in the video.
90% of correct actions are not taken in the video.
If we can remove all (average 2) already taken actions from candidates, the randomly chosen accuracy may be around 50%.

(if we assume the narration text correctly describes the action in the video.) Please note that I may make mistakes.

yusuke-intern commented 1 month ago

ChenYi99 commented 1 month ago

We acknowledge that our candidate options do include actions that have already occurred in the video. However, it is important to note that the action narrations provided in the task_progress_metadata are intended to serve as a reference only. In practice, during the model inference process, using information from the ground-truth action narrations is not allowed. The model must rely solely on visual observations to infer task progress. Therefore, your approach of using the ground-truth task_progress_metadata to eliminate options is not appropriate.