BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.15k stars 1.88k forks source link

Feat: Multiple Match Pattern Config; Pattern Avoid; Grap Content with innerHTML Compatible #97

Open FTAndy opened 7 months ago

FTAndy commented 7 months ago

Feat: Multiple Match Pattern Config; Pattern Avoid; Grap Content with innerHTML Compatible

Branch Information

Branch: patch/match-pattern

Description of Changes

This pull request introduces several enhancements to the GPT-crawler project:

  1. Customizable Pattern Matching: Allows users to define multiple patterns for matching, providing more flexibility in what content is crawled.

  2. Expanded Match Options: Introduces additional match options to improve compatibility with various content types.

  3. innerHTML Method for Content Compatibility: Implements an innerHTML method as a fallback when innerText does not contain any content. This ensures more robust content scraping, especially in cases where innerText might be empty.

Testing Done

Screenshots or Code Snippets

Changes Preview

Code Changes

Dependencies

Checklist

Conclusion

This PR aims to make the GPT-crawler more versatile and robust, catering to a wider range of use cases. The introduction of customizable pattern matching, expanded match options, and innerHTML compatibility marks a significant improvement in the project's functionality. Your review and feedback on these changes would be greatly appreciated.