OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
https://osu-nlp-group.github.io/SeeAct/
Other
571 stars 69 forks source link

Looking to test this against my website #2

Closed Shoshin23 closed 7 months ago

Shoshin23 commented 7 months ago

Hi,

I'm working in the accessibility space and I find the work you're doing to be extremely useful there. Can you tell me how I can run this project against my website? An agent that understands queries and completes them for me?

I'm trying to piece it together thru the code but you might be able to help me better.

Thanks!

boyugou commented 7 months ago

Hi, Karthik!

Accessibility was indeed a great application for the web navigation assistant we thought of when we were working on Mind2Web in the early days. Regarding the online script, we will release it around Friday. Please don't worry, running it will be very simple, you just need to enter the website and task. Its capabilities are not perfect, but it's worth a try.

Shoshin23 commented 7 months ago

Thanks! looking forward to trying it out. :)