BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.16k stars 1.88k forks source link

Selector help #19

Open nexuslux opened 7 months ago

nexuslux commented 7 months ago

Thanks for building this. Just wondering if there is an easier way or dynamic way to find the selector? Seems this is the part where it either breaks or I have difficulty.

So my normal approach would be to visit the site I want to scape, right click the contents that I want to scape and click 'inspect'. Then I right click again to copy the 'selector'. But the contents would be quite long and specific to that page... (e.g. #app > div.article-box.grid.container > div:nth-child(2) > div.acticle-content > div:nth-child(2) > div.normal.system.article-body > p:nth-child(6)

Any suggestions on how to streamline? or fix? Thanks again

inderpreet001 commented 7 months ago

I wanted to scrape the entire page without having to explicitly selecting the selector, so what I did is added 'body' as selector.

eg: selector: body

And this is working for me to fetch all data on the page without defining the specific selector