Refactor BaseChunker, SemanticChunker and SDPMChunker to support BaseEmbeddings

This pull request includes several changes to improve the flexibility and functionality of the chunking and embedding models in the chonkie package. The most important changes include updating the BaseChunker class to support token counters, modifying the SemanticChunker to use the new embedding model interface, and updating the tests to reflect these changes.

Enhancements to BaseChunker:

src/chonkie/chunker/base.py: Updated the BaseChunker class to accept a callable tokenizer or token counter, added methods to count tokens and batch count tokens, and adjusted the initialization logic to handle different types of tokenizers. [1] [2] [3] [4]

Improvements to SemanticChunker:

src/chonkie/chunker/semantic.py: Modified the SemanticChunker to use the new BaseEmbeddings interface, removed redundant import statements, and updated the initialization to use AutoEmbeddings for loading embedding models. [1] [2] [3] [4]

Updates to embedding models:

src/chonkie/embeddings/auto.py: Enhanced the AutoEmbeddings class to support different types of embedding models and updated the get_embeddings method to handle various model types. [1] [2] [3]
src/chonkie/embeddings/base.py: Added a method to get the tokenizer or token counter and implemented cosine similarity in the BaseEmbeddings class. [1] [2] [3]
src/chonkie/embeddings/sentence_transformer.py: Implemented the get_tokenizer_or_token_counter method in the SentenceTransformerEmbeddings class.

Test updates:

tests/chunker/test_sdpm_chunker.py and tests/chunker/test_semantic_chunker.py: Updated tests to use SentenceTransformerEmbeddings instead of SentenceTransformer. [1] [2] [3] [4]

bhavnicksm / chonkie

Refactor BaseChunker, SemanticChunker and SDPMChunker to support BaseEmbeddings #45