BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.14k stars 1.88k forks source link

Multiple Selectors not Reflected in Output #146

Open mahdii0908 opened 4 months ago

mahdii0908 commented 4 months ago

I am currently trying to use multiple selectors, in that i want two different parts of the page i am trying to scrape. I have tried using the :is (from previous post), but i only get the output for the first element reflected in the json-file.

export const defaultConfig: Config = {
  url: "https://xxx.dk/",
  match: "https://xxx.dk/**",
  selector: ":is(.tc_richcontent, .tc_page__body__standfirst)",
  maxPagesToCrawl: 2,
  outputFileName: "output_body.json",
  maxTokens: 2000000,
};

So for the config above, I only get .tc_richcontent reflected in my output-file, and not .tc_pagebodystandfirst. Any suggestions to get both parts?