BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.16k stars 1.88k forks source link

Multiple Selectors #28

Closed gummipunkt closed 7 months ago

gummipunkt commented 7 months ago

Is it possible to use multiple selectors?

InsightfulFuture commented 7 months ago

Use :is() then put a comma separated list of selectors inside.

Example:

export const config: Config = {
  url: "https://example.com",
  match: "https://example.com/**",
  selector: `:is(.selector1, .selector2, #complex > .selctor > #3, .etc)`,
  maxPagesToCrawl: 50,
  outputFileName: "output.json",
};
gummipunkt commented 7 months ago

Thanks, but it doesn't work:

selector: "is: (.tab-content, .article, .manu-content, #handle-tab-lightning)", (replaced only here ' with ")

InsightfulFuture commented 7 months ago

No space after the is, it’s :is(…) not :is (…)

gummipunkt commented 7 months ago

Thanks, solved this problem, but the next one is right around the corner:

WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.waitForSelector: Unexpected token "(" while parsing selector "is:(.tab-content, .article, .manu-content, #handle-tab-lightning)"

InsightfulFuture commented 7 months ago

@gummipunkt It’s not is:(…), it’s :is(…)

gummipunkt commented 7 months ago

@gummipunkt It’s not is:(…), it’s :is(…)

I'm just an idiot. Thanks a lot.

mahdii0908 commented 4 months ago

Hi,

I am trying to use the :is as stated above - The code runs, but I only get the first selector:

selector: :is(.tc_richcontent, .tc_page__body__standfirst)

only gives me .tc_richcontent in the output. Any suggestions to fix this?