LangStream / langstream

LangStream. Event-Driven Developer Platform for Building and Running LLM AI Apps. Powered by Kubernetes and Kafka.
https://langstream.ai
Apache License 2.0
386 stars 28 forks source link

[agents] Add support for AstraDB Collections (Astra Vector DB, using Stargate) #731

Closed eolivelli closed 10 months ago

eolivelli commented 10 months ago

Summary:

With this patch there is a new vector-database type: "astra-vector-db".

If you use "service=astra" than you are going to use the classic APIs with CQL, if you use "service=astra-vector-db" then you use the JSON API, based on Stargate (https://gitub.com/stargate).

The sample application and the integration tests contains examples about how to use it. We will follow up on the documents repository with detailed reference for all the commands.

Quick overview

configuration.yaml:

configuration:
  resources:
  - type: "datasource"
    name: "AstraDatasource"
    configuration:
      service: "astra-vector-db"
      token: "${secrets.astra-vector-db.token}"
      endpoint: "${secrets.astra-vector-db.endpoint}"

Similarity search:

  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "AstraDatasource"
      query: |
          {
              "collection-name": "documents",
              "limit": 20,
              "vector": ?
          }
      fields:
        - "value.question_embeddings"
      output-field: "value.related_documents"

Vector DB Sink:

  - name: "Write to Astra"
    type: "vector-db-sink"
    input: "chunks-topic"
    configuration:
      datasource: "AstraDatasource"
      collection-name: "documents"
      fields:
        - name: "id"
          expression: "fn:concat(value.filename, '-', value.chunk_id)"
        - name: "vector"
          expression: "value.embeddings_vector"
        - name: "text"
          expression: "value.text"
        - name: "filename"
          expression: "value.filename"
        - name: "chunk_id"
          expression: "value.chunk_id"