BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.59k stars 1.97k forks source link

Support `startsWith` selector #35

Open nikitavoloboev opened 10 months ago

nikitavoloboev commented 10 months ago

Want to index edgedb docs https://www.edgedb.com/docs/datamodel/index

Docs exist inside this class layout_docsContent__JzhPH where JzhPH changes page to page.

Currently the selector wants fixed value. Would be nice to support [class^="layout_docsContent__JzhPH"] essentially.

nikitavoloboev commented 10 months ago

I tried changing getPageHtml to

export function getPageHtml(page: Page) {
  return page.evaluate((classNameStart) => {
    const el = document.querySelector(
      `[class^='${classNameStart}']`
    ) as HTMLElement | null;
    return el?.innerText || "";
  }, config.selector);
}

then in config passing:

selector: `layout_docsContent`,

but that failed :(