bhavnicksm / chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
https://pypi.org/project/chonkie/
MIT License
1.68k stars 60 forks source link

Update DOCS.md - fixed embeddings path after recent change #56

Closed pratyushmittal closed 4 days ago

pratyushmittal commented 4 days ago

The default models are no longer sourced from sentence-transformers. Hence we need to give complete path.

bhavnicksm commented 4 days ago

Hey Pratush!

Thanks for opening a PR 😊

Uh, that's true; passing the all-minilm-l6-v2 wouldn't work anymore because the update on the SentenceTransformerEmbeddings, but it ideally should right?

Actually, the issue is in the Matching logic, the EmbeddingsRegistry does not give out an error when no match is found, which means the the AutoEmbeddings.get_embeddings() would return None which should not happen.

Do you wish to take up this PR?

bhavnicksm commented 4 days ago

Another interesting thing to note is that it doesn't show up in the tests, because the .match function looks for all-MiniLM-L6-v2 (with Capitals)

So if you pass in just all-MiniLM-L6-v2 it seems to actually load up properly, but not in small.

bhavnicksm commented 4 days ago

Fixed in #57, closing PR