Open farzad528 opened 11 months ago
cc: @mattmsft @BevLoh
@xiangyan99 I’m recalling we had some mechanisms to do this using search indexing buffered sender, not sure if we’re missing a sample on how to do this or if there’s missing functionality? I could be remembering wrong
Per our guideline, we don't do client side validation.
Let's work with the architect to see how to address it.
Is your feature request related to a problem? Please describe. When using
SearchIndexingBufferedSender
to upload a large number of documents (such as 1 billion vectors with text), the current implementation doesn't automatically handle the 64 MB payload size limit and the 32,000 documents count limit per batch. This leads to a need for manual intervention to ensure that documents are indexed successfully, which can be particularly frustrating when running long indexing jobs, potentially overnight.Describe the solution you'd like I propose the following enhancements to the
SearchIndexingBufferedSender
to make the experience of indexing large datasets more seamless and robust:Describe alternatives you've considered An alternative approach would be to manually chunk the data and implement custom logic for progress reporting and error handling. However, this adds complexity and requires additional development effort, which could be avoided if the SDK provided these capabilities out of the box.
Additional context The goal is to enable users to confidently run a single function to index a massive number of documents without needing to worry about the internal limitations of the service. Enhancing the
SearchIndexingBufferedSender
with the above features would significantly improve the developer experience and the reliability of the indexing process for large datasets.