BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.14k stars 1.88k forks source link

feat: Customizing selector can sometimes cause web crawlers to fail #154

Open kukuze opened 4 months ago

kukuze commented 4 months ago

When crawling a webpage, if a custom selector is selected, but the initial page does not have a suitable selection object, it will cause the crawler to fail. If a custom selector is not used, the truly valuable page (which can filter out a lot of useless information through the selector) will have a lot more useless information, such as the "homepage". So this submission can be used when a custom selector is available, and when it is not possible, use "body" as the CSS selector.