Not fully reproduced - Githubissues

Thank you for bringing this issue to our attention. The problem may be due to the following reasons:

Model Verification: Could you please confirm if you are using the OdysseyAgent-app model? Input Data Check: Please ensure that you are providing the model with an action history of length 4 and the corresponding screenshot history information. Sample skipped: From the log outputs, it appears that during some evaluation steps, certain steps were skipped (possibly due to image reading failures). This caused the corresponding episode to be directly marked as "failed," leading to a lower SR value. Environment Differences: Different device environments can also result in slight variations in test results. For your reference, our tests were conducted on an A100-80G . SR is a metric that is significantly affected by randomness—if even one step in an episode is incorrect, the entire episode is considered a failure.

We hope this helps resolve your issue. Thank you for your understanding.

OpenGVLab / GUI-Odyssey

Not fully reproduced #3