OpenGVLab / GUI-Odyssey

GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
55 stars 2 forks source link

Not fully reproduced #3

Open smallduiyue8 opened 1 month ago

smallduiyue8 commented 1 month ago

This is the result of my evaluation on the app aplit model。 image It is lower than the indicator in the paper。What could be the reason?

Lqf-HFNJU commented 1 month ago

Thank you for bringing this issue to our attention. The problem may be due to the following reasons:

Model Verification: Could you please confirm if you are using the OdysseyAgent-app model? Input Data Check: Please ensure that you are providing the model with an action history of length 4 and the corresponding screenshot history information. Sample skipped: From the log outputs, it appears that during some evaluation steps, certain steps were skipped (possibly due to image reading failures). This caused the corresponding episode to be directly marked as "failed," leading to a lower SR value. Environment Differences: Different device environments can also result in slight variations in test results. For your reference, our tests were conducted on an A100-80G . SR is a metric that is significantly affected by randomness—if even one step in an episode is incorrect, the entire episode is considered a failure.

We hope this helps resolve your issue. Thank you for your understanding.