block / goose

Goose is a developer agent that operates from your command line to help you do the boring stuff.
https://block.github.io/goose/
Apache License 2.0
108 stars 17 forks source link

feat: web browsing #154

Closed michaelneale closed 6 days ago

michaelneale commented 1 week ago

one of the most basic things I find I miss is when I put i a URL I want goose to be able to read and refer to its content. This uses playwright to do that with chromium (or falls back to something else) via a temp file (as content can be too large for context).

allows one to do things like:

image

which could be handy for refering to up to the minute info or apis.

lily-de commented 1 week ago

pyproject.toml has some conflicts but otherwise LGTM!

lamchau commented 1 week ago

if we're using playwright only for text content and not leveraging multi modal capabilities then we could significantly reduce the dependency payload by using just curl + html2text

curl --silent "https://en.wikipedia.org/wiki/Lapstone_Zig_Zag" | html2text | rg "opened for traffic"
behind schedule in December 1865. It opened for traffic in 1867.
michaelneale commented 1 week ago

if we're using playwright only for text content and not leveraging multi modal capabilities then we could significantly reduce the dependency payload by using just curl + html2text

curl --silent "https://en.wikipedia.org/wiki/Lapstone_Zig_Zag" | html2text | rg "opened for traffic"
behind schedule in December 1865. It opened for traffic in 1867.

yes - will see if we can make it a bit better with rendering like you said, but otherwise, yes a simpler out of the box text only is fine to start with too.

michaelneale commented 1 week ago

@lamchau @ahau-square @zakiali address comments - worthwhile taking a look again (is a bit smarter and will try to make it text which is more manageable). Adding a dependency is not a small deal - want to make sure it is worthwhile for something like this.

zakiali commented 6 days ago

This works for me on basic examples (e.g. look up the python documentation and summarize the new features available in 3.13)... was able to go do that, but on more complicated pages, I keep hitting up against size limits. e.g. for This query "check the bbc headlines for today" I get "The full content of the BBC News page is too lengthy to process at once. I will narrow down the file content further by checking segments directly or searching for specific headline tags again. Let me try another approach to extract the headlines." It tried to break it down, but was still unweildy for it.

Also something like "what is googles homepage image of the day", was not working super well -- I expected it to find the hyperlink and describe it or find and send off the image to get a response. Maybe we stick with text for now and remove playwright? Or maybe look into something like https://tavily.com/ (haven't used it myself but a standard tool for browsing it seems like. Down side is that the free tier is only 1000calls/month)

michaelneale commented 6 days ago

@zakiali yeah don't want to have a service used for this, but can you try with latest? since recent change I had no issue with:

image and more

but may be just luck? Plan B is just simple yes. Aim is really for developer relevant things (like docs) not so much multimodal (and against sites that actively defend against crawling).

here it is with just httpx, which I think may make it simpler and no deps:

image

zakiali commented 5 days ago

this works now! I thought i pulled in the recent changes when testing, but must've not? This is great now though!