TeiaLabs / redb

1 stars 0 forks source link

[DISCUSSION] Indices #6

Closed joaovbsevero closed 1 year ago

joaovbsevero commented 1 year ago

This issue will discuss some possibilities to allow the creation of indices in the child classes of Document.

  1. One approach that we have already explored on a different project (MongoWrapper) is using a class member to explicit name all columns that should be indices, example:

Given a indice structure like:

@dataclass
class IndiceField:
    keys: list[str] | None = None
    name: str | None = None
    unique: bool | None = None
    min: Any = None
    max: Any = None

Users would use it like:

class Embedding(Document):
    __indices__: list[IndiceField] = None
  1. Another approach would be based on pydantic idea to define a base Field with special keywords

Given a field structure like:

@dataclass
class Field:
    type: str | None = None
    default: Any = None
    unique: bool | None = None
    min: Any = None
    max: Any = None

Users would use it like:

class Embedding(Document):
    kb_name = Field(type=str, unique=True)

Where the presence of the "unique" keyword makes it a indice

  1. The last approach is based on tortoise where there is a special class defined for configuration (similar to pydantic Config class)

Given a config structure like:

@dataclass
class Config:
    indices: list[IndiceField] = None

Users would use it like:

class Embedding(Document):
    class Config:
        indices = [...]

Some notes regarding these approaches:

  1. All approaches support inheritance
  2. The first approach is by far the easiest one but maybe not the most elegant (defining field seems weird from the user point of view)
  3. The second approach provides a flexible, although verbose way to define the class fields and would require a wrapper around the pydantic Field, not the most complex solution but requires some extra work and some extra typing from the user end (for each member, type "name = Field(type....")
  4. Third solution is very elegant but can bring problems with inheritance, we already faced some challenges when following this approach on MongoWrapper that were hard to solve.
cardoso-neto commented 1 year ago

Milvus has indices that need further parametrization. We'll need to either generalize these fields or allow for specialization via inheritance with MilvusIndex/MongoIndex etc.

cardoso-neto commented 1 year ago

Don't we already have a metaclass-based solution for number 3's inheritance problem? What do we gain by going with number 3 over number 2? I'm asking this, because number 2 would solve a few other "parametrization" problems (dimensionality of float vectors, is_primary_key, auto incrementers, date fields, hash fields, etc.).

joaovbsevero commented 1 year ago

Milvus has indices that need further parametrization. We'll need to either generalize these fields or allow for specialization via inheritance with MilvusIndex/MongoIndex etc.

Specialization would break the "single inheritance" from the user point of view that was really hard to achieve

Don't we already have a metaclass-based solution for number 3's inheritance problem? What do we gain by going with number 3 over number 2? I'm asking this, because number 2 would solve a few other "parametrization" problems (dimensionality of float vectors, is_primary_key, auto incrementers, date fields, hash fields, etc.).

Number 3 provides more flexibility over what the indice can be, if the user wants to define multiple columns for the same indice, fine, the user wants to create two indices for the same column, fine, whereas number 2 can restrict some possibilities for the users (such as the two examples provided).

Side note, number 1 is the same as number 3 but less elegant