OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
https://osu-nlp-group.github.io/SeeAct/
Other
571 stars 69 forks source link

Is the demo still working? #19

Closed agonza1 closed 3 months ago

agonza1 commented 4 months ago

Hi, nice project! I have been playing with it for a while and recently (yesterday) I noticed that the demo doesn't work for me anymore... It gets stuck with Start Multi-Choice QA - Batch 0

I thought perhaps a breaking change with some dependency but hard to track...It just opens the webpage in chromium and hangs in there and doesn't throw any errors either.

Full log:

2024-04-03 20:52:08,008 - website: https://www.google.com/
2024-04-03 20:52:08,008 - task: Find blog post "XYZ"
2024-04-03 20:52:08,009 - id: 2024-04-03_20-52-08
2024-04-03 20:52:13,612 - ==========
2024-04-03 20:52:13,612 - Time step: 0
2024-04-03 20:52:13,612 - ----------
2024-04-03 20:52:13,981 - # all elements: 18
2024-04-03 20:52:13,983 - batch size: 18
2024-04-03 20:52:13,983 - ----------
2024-04-03 20:52:13,983 - You are asked to complete the following task: Find blog post "XYZ"
2024-04-03 20:52:13,983 - Previous Actions:
None
2024-04-03 20:52:13,983 - ----------
2024-04-03 20:52:13,983 - Start Multi-Choice QA - Batch 0
agonza1 commented 4 months ago

It was an issue on my end. Errors in inference_engine.py are hidden due to the @backoff.on_exception, perhaps we could display the error even if we do a retry: https://github.com/OSU-NLP-Group/SeeAct/pull/20