bhavnicksm / chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
https://pypi.org/project/chonkie/
MIT License
1.55k stars 57 forks source link

Make imports as a part of Chunker __init__ instead of file imports to make Chonkie import faster #12

Closed bhavnicksm closed 2 weeks ago

bhavnicksm commented 2 weeks ago

This pull request introduces several updates to improve the flexibility and robustness of the chunking system by supporting multiple tokenizer backends and refining the import mechanisms for external libraries. The most significant changes include adding a dynamic tokenizer loading mechanism, updating the initialization of various chunkers to accept different tokenizer types, and restructuring the import logic for external libraries like spaCy and sentence-transformers.

Tokenizer Support Enhancements:

Chunker Initialization Updates:

External Library Import Improvements:

Minor Adjustments: