BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.17k stars 1.89k forks source link

Added 'Exclude' URL Feature to Scraper #12

Open adamlaz opened 7 months ago

adamlaz commented 7 months ago

I added a small but handy feature 🏗️

Now you can specify URLs to be ignored 🙈 during the crawls. This should help skip stuff you don’t need and keep the output data cleaner.

What's in this PR:

  1. Added an exclude field in config.ts for patterns we want to skip.
  2. Tweaked requestHandler in src/main.ts to filter out these URLs.

This PR will close #9 🥳

adamlaz commented 7 months ago

Test it out and let me know what you think!

This was quick and first way I thought of, pretty sure this breaks things in normal use cases.. But, it worked when I was just playing with it. I think it needs more logic when not including the exclude param.