dgarnitz / vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
https://www.getvectorflow.com/
Apache License 2.0
670 stars 47 forks source link

Setting up redis integration #57

Closed EteimZ closed 1 year ago

EteimZ commented 1 year ago

What

This PR seeks to integrate Redis Search into vectorflow.

Why

Currently vectorflow doesn't have Redis integration yet.

Usage

First we have to create a vector index on redis that follows the vectorflow schema standard as defined here.

import redis
from redis.commands.search.field import NumericField, VectorField, TextField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType

INDEX_NAME = "vectorflow_idx" # Vector Index Name
DOC_PREFIX = "vec:"

r = redis.from_url(url = 'redis://your-redis-host:your-redis-port', password="your-redis-password", decode_responses=True)

def create_index(vector_dimensions: int):
    try:
        # check to see if index exists
        r.ft(INDEX_NAME).info()
        print("Index already exists!")
    except:
        # schema
        schema = (
            TextField("source_data"),
            VectorField("embeddings",              # Vector Field Name
                "HNSW", {                          # Vector Index Type: FLAT or HNSW
                    "TYPE": "FLOAT32",             # FLOAT32 or FLOAT64
                    "DIM": vector_dimensions,      # Number of Vector Dimensions
                    "DISTANCE_METRIC": "COSINE",   # Vector Search Distance Metric
                }
            ),
        )

        # index Definition
        definition = IndexDefinition(prefix=[DOC_PREFIX], index_type=IndexType.HASH)

        # create Index
        r.ft(INDEX_NAME).create_index(fields=schema, definition=definition)

        print("Index successfuly created")

# define vector dimensions
VECTOR_DIMENSIONS = 1536

# create the index
create_index(vector_dimensions=VECTOR_DIMENSIONS)

Sample curl request:

curl -X POST -H 'Content-Type: multipart/form-data' -H "Authorization: my-internal-api-key" -H "X-EmbeddingAPI-Key: your-embeding-key" -H "X-VectorDB-Key: your-redis-password" -F 'EmbeddingsMetadata={"embeddings_type": "OPEN_AI", "chunk_size": 256, "chunk_overlap": 128}' -F 'SourceData=@./src/api/tests/fixtures/test_text.txt' -F 'VectorDBMetadata={"vector_db_type": "REDIS", "index_name": "vectorflow_idx", "environment": "redis://your-redis-host:your-redis-port", "collection": "vect" }'  http://localhost:8000/embed

Verification

Request to vectorflow:

confirm_redis

Confirm that keys have been uploaded:

confirm_redis_2

dgarnitz commented 1 year ago

Looks great so far!

EteimZ commented 1 year ago

Great work! Please rebase and make the requested changes, then it should be ready to go

Okay I will rebase to fix the conflict