Open azchohfi opened 3 weeks ago
@SteveSandersonMS, opinions on this?
@luisquintanilla, @SteveSandersonMS, do we want to do anything with this one, or just say it's up to consumers?
Personally I think we'd want to support streaming if it was inherent to the underlying generator (as it is for chat). But since it isn't inherent, it feels more like a pattern that consumers would apply themselves if they want it. Then it's up to the consumer to decide things like whether to parallelize the chunks.
So I'd vote for not layering on this concept ourselves when it's not inherent to the concept of embedding generation.
Background and motivation
The IEmbeddingGenerator interface doesn't support streaming, which makes sense mostly with batching (for remote/cloud implementations) or with local embeddings models, that runs much slower in the CPU, for example.
API Proposal
API Usage
Alternative Designs
Such method is likely going to be similar between different implementations, so maybe an extension method would suffice.
Risks
It does make the implementation slightly more complex, and maybe the existing implementations would only call the GenerateAsync method, without leveraging chunking or a similar approach.