Feature request: Vector Storage does not allow to specify document id indexing logic

Use case

Today, the vector storage connector uses the document url or the chunk identifier (if the document is a chunk) to provide a document identifier to OpenSearch when indexing the document. This a problem for documents that change often as this can lead to a duplication of modified chunks in the OpenSearch storage.

Solution/User Experience

Provide a way for end-users to define how they want the vector storage connector to index documents (e.g append-only, or a potential removal of previous chunks before insertion).

Alternative solutions

No response

awslabs / project-lakechain