bhavnicksm / chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
https://pypi.org/project/chonkie/
MIT License
1.55k stars 57 forks source link

fix: tokenizer mismatch for `SemanticChunker` + Add BaseEmbeddings #24

Closed bhavnicksm closed 1 week ago

bhavnicksm commented 1 week ago

This pull request includes significant updates to the chonkie package, primarily focusing on removing the dependency on tokenizers and enhancing the chunking and embeddings functionalities. The most important changes include the removal of the tokenizer from the chunkers, the addition of a new base embeddings class, and updates to the documentation and tests to reflect these changes.

Removal of Tokenizer Dependency:

Enhancements to Embeddings:

Documentation Updates:

Configuration Changes:

These changes streamline the chunking process by removing unnecessary dependencies and introduce a new abstraction for embeddings, making the codebase more modular and easier to maintain.