BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.15k stars 1.88k forks source link

Issue with wildcard usage in 'match' configuration #104

Open dxbmax opened 6 months ago

dxbmax commented 6 months ago

how do I crawl URLs ending with /specifications.html, differing only in a segment. I tried using https://example.com/category/*/specifications.html (with a wildcard), but it's not matching as expected.

export const defaultConfig: Config = {
  url: "https://www.example.com/category/",
  match: "https://example.com/category/*/specifications.html",
  maxPagesToCrawl: 50,
  outputFileName: "output.json",
};

This does not work. I am trying to target these URLs: https://example.com/category/A11/specifications.html https://example.com/category/B13/specifications.html https://example.com/category/G1b/specifications.html https://example.com/category/Z3Z2/specifications.html