Open tobitege opened 3 days ago
@neubig there's been a few issues opened up for the BrowserAgent. I know there's work being done on replacing it with a better agent. Just want to confirm my understanding is correct and it is being worked on?
That is correct! We're hoping to release our paper and code this week, and then we'll need to do some testing to make sure that we can get it working without regressions to other scores like SWE-bench. I'll update soon.
Summary
Currently the generated axtree content for retrieved websites incurs a huge amount of tokens and cost. Maybe below combination of Playwright with BeautifulSoup can save tokens, cost and runtime?
Here's what Gemini Pro 1.5 gave as a suggestion (general example, not within OpenHands):
You're facing a common challenge when using LLMs with web scraping: the sheer volume of HTML can overwhelm the tokenizer and lead to unnecessary costs and slower processing. Extracting the main content while preserving interactive elements is the right approach. While a single perfect package might not exist, combining existing tools and techniques can achieve this.
Here's a breakdown of a strategy using Playwright and Beautiful Soup, along with explanations and code examples:
1. Extract Content with Playwright and Beautiful Soup:
Playwright is excellent for interacting with dynamic content, while Beautiful Soup excels at parsing HTML. We'll use them together.
Explanation and Improvements:
page.wait_for_timeout(2000)
or other, more robust methods to handle dynamic content loading. You might need to adjust the timeout or use more specific selectors/conditions.extract_main_content
function now includes multiple strategies:<article>
tags, commonly used for main content.<body>
element.process_links
function now adds unique identifiers to each link using Markdown format[link text](URL)
, making it easier for the LLM to understand and act upon them. The link ID is also added as an attribute for easier access if needed.async
andawait
correctly for asynchronous operations with Playwright.try...except
blocks) to handle potential issues like network errors or invalid selectors.2. Pass to LLM:
Now you can pass the
modified_content
(which is a string of HTML with identifiable links) to your LLM. The reduced size and structured links will make processing significantly more efficient.3. Agent Interaction:
When the LLM decides to "click" a link, it can refer to the link ID (e.g., "Click link 3"). Your agent can then parse this instruction, extract the corresponding URL from the original HTML (using the ID), and navigate to that URL using Playwright.
This combined approach addresses your token limit issues while retaining the interactive elements needed for your agent. Remember to tailor the content extraction strategy to the specific websites you're working with for optimal results. Consider using Readability type libraries for more advanced content extraction. Also, experiment with different waiting strategies in Playwright to ensure all dynamic content is loaded before parsing.