As an automation engineer I want autonomous agents to perform online work that otherwise require human intervention or a lot of custom code.

This may require a long-lived session in cases where state is built up over time (ie, in-browser manipulation, authentication, etc).

Configuration

Allow session creation (can the agent create arbitrary sessions, or do they just have one default session available to them)
headless <- unsure
allowed_urls <- regex of allowed urls that the agent is allowed to browse on. This would cover external rules
hidden_strings <- list of strings that should be hidden from the agent. Ie, password information if desired
other...

This likely involves exposing playwright more or less directly to the agent. Perhaps with scoped url allowances. TODO investigate what this actually looks like

We need to look at playwright api to understand how external navigation / popups work. Presumably the agent should be able to navigate these as well as long as it is allowed by url rules.

When continuing session we need ways for the agent to get the current state and manipulate the DOM. This is likely a passthrough of the playwright api.

File downloads need to be handled gracefully in a way accessible to the agent / user, so, for example, they can be uploaded to external locations. Hooking into the existing "files" api is likely graceful way to handle this.

Cases to consider

[ ] #929
- respond to questions about text / data
- Clicking links
- navigating tabs opened externally
- submitting form information (ie, filling in and submitting)
[ ] #934
- Multi-factor authentication with TOTP
[ ] #930
- download files
- access downloaded files
[ ] Multimedia support
- respond to questions about images on page (spicier) <- get image of browser?
- respond to information about videos / audio (spiciest)
[ ] Update Web-Researcher to demonstrate new browser capabilities

eidolon-ai / eidolon

Interactive browser sessions #928

Configuration

Cases to consider