[FEAT] Add async support to SDPMChunker and to SemanticChunker

bhavnicksm / chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

https://pypi.org/project/chonkie/

MIT License

1.33k stars 47 forks source link

[FEAT] Add async support to SDPMChunker and to SemanticChunker #39

Open rodion-m opened 2 hours ago

rodion-m commented 2 hours ago

Proper async support will give an ability to properly parallelize the execution of remote embeddings computation.

bhavnicksm commented 2 hours ago

Hey @rodion-m!

Thanks for opening the issue 😊

Could you please elaborate on the benefits of adding async and the use case for it? Personally, I haven't used the async chunking support offered by other libraries before, nor am I aware of devs/companies who use it often. Maybe I just haven't been exposed to those use cases, so having some context would help here...

If you think this could be of wider utility, I'd be happy to accept PRs for it.

Thanks!

rodion-m commented 2 hours ago

Sure, here is my use case:

Proper async support will give an ability to properly parallelize the execution of remote embeddings computation.

So, we need speed :)

bhavnicksm commented 2 hours ago

@rodion-m,

Correct me if I am wrong, but essentially, this is for faster embeddings computation with APIs like OpenAI, Anthropic etc to pass a bunch of chunks to get their embeddings async?

By proper async support, you mean only for the embeddings right? Or do you mean that chunks should also be generated asynchronously?

I plan to add async/batching support for embedding APIs like OpenAI, because we would need that capability, yes!

I'm not sure of the second one i.e. async chunk generation yet.

Sorry for all the questions, just trying to make sure I understand this correctly

Thanks!

rodion-m commented 50 minutes ago

I mean we should be able to call IO operations (like API calls for embeddings) asynchronously to get an ability to speed up embeddings generation in cases when we need to embed huge datasets and batch possibilities are exhausted. Here is an example of LiteLLM's implementation: https://docs.litellm.ai/docs/embedding/async_embedding