add exclude to the script (my first pull request)

Hello,

I am excited to submit this pull request, which introduces a new feature to the GPT Crawler project. This feature enables users to exclude specific HTML tags from the scraping process, thereby enhancing the cleanliness and relevance of the data extracted.

Key Changes

Added an selectorexcl option in the crawler configuration.
Updated the getPageHtml function to handle the exclusion of specified HTML elements.
Included examples and instructions in the README for utilizing this new feature.

Motivation

In many web scraping scenarios, it's crucial to focus only on relevant data while excluding unnecessary elements like headers, footers, and scripts. This feature addresses that need by allowing users to specify elements to exclude, thus streamlining the data extraction process for cleaner and more efficient results.

I believe this feature will be a valuable addition to the GPT Crawler project, offering users more control over the data they are scraping. I look forward to your feedback and hope to contribute further to the development of this project.

Best regards, Peter Goedhart

BuilderIO / gpt-crawler

add exclude to the script (my first pull request) #101

Key Changes

Motivation