Abraxas-365 / langchain-rust

🦜️🔗LangChain for Rust, the easiest way to write LLM-based programs in Rust
MIT License
491 stars 63 forks source link

Ability to add custom ids for documents #177

Open ismajl-ramadani opened 2 months ago

ismajl-ramadani commented 2 months ago

I want to add custom IDs for documents so that when I do re-indexing for updates, I can reference them with the ID I have in the data source.

My specific case is with the OpenSearch vector store. Right now as a workaround, I'm running a local build of this crate, and I have added a function with the following signature:

async fn add_documents_with_ids(
    &self,
    docs: &[Document],
    opt: &VecStoreOptions,
    ids: &Vec<String>,
) -> Result<Vec<String>, Box<dyn Error>> {

and then I'm zipping the ids together with the docs and vectors

for (doc, (vector, doc_id)) in zip(docs.iter(), zip(vectors.iter(), ids.iter())) {

and finally adding the id to the docs

let operation = json!({"index": {
    "_id": doc_id,
}});

I wanted to ask if someone would be interested in having this feature as well, and if yes, any suggestions from maintainers on how to implement this without having to break the core trait of VectorStore somehow.

Also, Python package has one additional field for the add_docs, called ids and you can see it in this file: https://github.com/langchain-ai/langchain/blob/29aa9d67506ac07b92d37d58c684ce3c6dc290cd/libs/community/langchain_community/vectorstores/opensearch_vector_search.py#L587

fgsch commented 2 months ago

I too noticed this while working on something else.

Personally, I'd like to see the ids added as an Option<Vec<String>> or similar to add_documents, but this will break backward compatibility.

@Abraxas-365 @prabirshrestha, any thoughts on this?

prabirshrestha commented 1 month ago

I'm good with the breaking change. Feel free to send a PR.