BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.15k stars 1.88k forks source link

FR: remove cruft from links #73

Open dandv opened 7 months ago

dandv commented 7 months ago

Currently the crawler seems to treat links as different if the query parameters are different. In some cases (e.g. utm_ trackers, Notions' pvs junk, and crap like that), the links should be cleaned up.

One way to address this would be to have an array of URL params in config.ts that should be removed in order to obtain the canonical URL for a page.