As an automation engineer I want autonomous agents to perform online work that otherwise require human intervention or a lot of custom code.
This may require a long-lived session in cases where state is built up over time (ie, in-browser manipulation, authentication, etc).
Configuration
Allow session creation (can the agent create arbitrary sessions, or do they just have one default session available to them)
headless <- unsure
allowed_urls <- regex of allowed urls that the agent is allowed to browse on. This would cover external rules
hidden_strings <- list of strings that should be hidden from the agent. Ie, password information if desired
other...
This likely involves exposing playwright more or less directly to the agent. Perhaps with scoped url allowances. TODO investigate what this actually looks like
We need to look at playwright api to understand how external navigation / popups work. Presumably the agent should be able to navigate these as well as long as it is allowed by url rules.
When continuing session we need ways for the agent to get the current state and manipulate the DOM. This is likely a passthrough of the playwright api.
File downloads need to be handled gracefully in a way accessible to the agent / user, so, for example, they can be uploaded to external locations. Hooking into the existing "files" api is likely graceful way to handle this.
Cases to consider
[ ] #929
respond to questions about text / data
Clicking links
navigating tabs opened externally
submitting form information (ie, filling in and submitting)
[ ] #934
Multi-factor authentication with TOTP
[ ] #930
download files
access downloaded files
[ ] Multimedia support
respond to questions about images on page (spicier) <- get image of browser?
respond to information about videos / audio (spiciest)
[ ] Update Web-Researcher to demonstrate new browser capabilities
As an automation engineer I want autonomous agents to perform online work that otherwise require human intervention or a lot of custom code.
This may require a long-lived session in cases where state is built up over time (ie, in-browser manipulation, authentication, etc).
Configuration
This likely involves exposing playwright more or less directly to the agent. Perhaps with scoped url allowances. TODO investigate what this actually looks like
We need to look at playwright api to understand how external navigation / popups work. Presumably the agent should be able to navigate these as well as long as it is allowed by url rules.
When continuing session we need ways for the agent to get the current state and manipulate the DOM. This is likely a passthrough of the playwright api.
File downloads need to be handled gracefully in a way accessible to the agent / user, so, for example, they can be uploaded to external locations. Hooking into the existing "files" api is likely graceful way to handle this.
Cases to consider
[ ] #929
[ ] #934
[ ] #930
[ ] Multimedia support
[ ] Update Web-Researcher to demonstrate new browser capabilities