Open rihp opened 6 months ago
Here is one approach that uses Puppeteer to browse and take screenshots that are passed to GPT4V
I also would suggest Browserless api, the cloud service is expensive but if you host it with docker you can have both Puppeteer and Playwright endpoints, allowing you to specify a remote location for chrome via the browserWSEndpoint option. Setting this for Browserless is a single line of code change. With a connection like this: Puppeteer
const browser = await puppeteer.launch();
const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://localhost:3000' });
Playwright
const browser = await pw.chromium.launch();
const browser = await pw.chromium.connect('ws://localhost:3000/playwright/chromium');
Is your feature request related to a problem? Please describe. execute web scraping tasks on a local machine without relying on third-party providers such as SERP
SERP cost money Describe the solution you'd like Selenium driver is the current go to for webscraping locally, pretty well documented so implementation on a local machine should be pretty straight forward
Describe alternatives you've considered beautiful soup, an http request with parsing scripts to analyze the content of the pages similar to the polywrap web scrape wrapper. Duckduck go offers an api which is somewhat decent, but its again a 3rd party
Additional context requested by Glitch and Devil on Discord https://discord.com/channels/1146873191969587220/1184124197853732874