Feat: Multiple Match Pattern Config; Pattern Avoid; Grap Content with innerHTML Compatible
Branch Information
Branch:patch/match-pattern
Description of Changes
This pull request introduces several enhancements to the GPT-crawler project:
Customizable Pattern Matching: Allows users to define multiple patterns for matching, providing more flexibility in what content is crawled.
Expanded Match Options: Introduces additional match options to improve compatibility with various content types.
innerHTML Method for Content Compatibility: Implements an innerHTML method as a fallback when innerText does not contain any content. This ensures more robust content scraping, especially in cases where innerText might be empty.
Testing Done
Comprehensive compatibility checks have been conducted.
Ensured that npm run start operates smoothly without causing any breaks in the existing functionality.
Screenshots or Code Snippets
Code Changes
The config.ts file has been updated with new matching patterns and configurations.
Added minimatch package to handle pattern matching, reflected in package.json and package-lock.json.
Significant updates in src/config.ts and src/core.ts to implement the new features.
Dependencies
Addition of the minimatch package for enhanced pattern matching capabilities.
Checklist
[x] Updated README.md to reflect new changes and configurations.
Conclusion
This PR aims to make the GPT-crawler more versatile and robust, catering to a wider range of use cases. The introduction of customizable pattern matching, expanded match options, and innerHTML compatibility marks a significant improvement in the project's functionality. Your review and feedback on these changes would be greatly appreciated.
Feat: Multiple Match Pattern Config; Pattern Avoid; Grap Content with innerHTML Compatible
Branch Information
Branch:
patch/match-pattern
Description of Changes
This pull request introduces several enhancements to the GPT-crawler project:
Customizable Pattern Matching: Allows users to define multiple patterns for matching, providing more flexibility in what content is crawled.
Expanded Match Options: Introduces additional
match
options to improve compatibility with various content types.innerHTML Method for Content Compatibility: Implements an
innerHTML
method as a fallback wheninnerText
does not contain any content. This ensures more robust content scraping, especially in cases whereinnerText
might be empty.Testing Done
npm run start
operates smoothly without causing any breaks in the existing functionality.Screenshots or Code Snippets
Code Changes
config.ts
file has been updated with new matching patterns and configurations.minimatch
package to handle pattern matching, reflected inpackage.json
andpackage-lock.json
.src/config.ts
andsrc/core.ts
to implement the new features.Dependencies
minimatch
package for enhanced pattern matching capabilities.Checklist
README.md
to reflect new changes and configurations.Conclusion
This PR aims to make the GPT-crawler more versatile and robust, catering to a wider range of use cases. The introduction of customizable pattern matching, expanded match options, and
innerHTML
compatibility marks a significant improvement in the project's functionality. Your review and feedback on these changes would be greatly appreciated.