OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
https://osu-nlp-group.github.io/SeeAct/
Other
599 stars 74 forks source link

SeaAct cannot find screenshot of the page #36

Closed vbxx0 closed 1 month ago

vbxx0 commented 3 months ago

Hello guys! I have a problem launching example script on my website. FileNotFoundError: [Errno 2] No such file or directory: 'seeact_agent_files/20240608_162057/screenshots/screen_1.png'

async def run_agent():
    agent = SeeActAgent(model="gpt-4-turbo",
                        default_task=TASK,
                        default_website=WEBSITE,
                        # save_file_dir="/tmp/test"
                        )
    await agent.start()
    while not agent.complete_flag:
        prediction_dict = await agent.predict()
        await agent.execute(prediction_dict)
    await agent.stop()

if __name__ == "__main__":
    asyncio.run(run_agent())

I didn't find any screenshots in the given folder, but found logs (website name was replaced) image

As for google.com this works fine

boyugou commented 3 months ago

because you are opening "example.com", which is not an actual web page?

vbxx0 commented 3 months ago

@boyugou I replaced for the issue but its https://app.usebraintrust.com/jobs/12636

boyugou commented 3 months ago

Interestingly, I also failed to take screenshots of this webpage for the first time (succeeded the second time). Will try to figure out the reason.

vbxx0 commented 3 months ago

Interestingly, I also failed to take screenshots of this webpage for the first time (succeeded the second time). Will try to figure out the reason.

I believe it's a playwright issue. I think it will be useful to make ability to provide my own playwright instance (for example if I need playwright-stealth)