chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.59k stars 1.22k forks source link

[Bug]: 0.5.4 breaks static type checks for Collection.query with include #2578

Open pmeier opened 1 month ago

pmeier commented 1 month ago

What happened?

TL;DR Type checking Collection.query with include set breaks with chromadb>=0.5.4 or more specifically after #2416 (cc @jeffchuber).

I took the example from the README, stripped all comments and added an include to the Collection.query call:

import chromadb

client = chromadb.Client()

collection = client.create_collection("all-my-documents")

collection.add(
    documents=[
        "This is document1",
        "This is document2",
    ],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc1", "doc2"],
)

results = collection.query(
    query_texts=["This is a query document"],
    n_results=2,
    include=["distances", "metadatas", "documents"],
)

This type checks fine for chromadb<=0.5.3, but results in errors (see below) for chromadb>=0.5.4.

Versions

Chroma 0.5.3 / 0.5.4, Python 3.9.19, mypy 1.11.0

Relevant log output

chromadb<=0.5.3

Success: no issues found in 1 source file

chromadb>=0.5.4

main.py:20: error: List item 0 has incompatible type "str"; expected "IncludeEnum"  [list-item]
        include=["distances", "metadatas", "documents"],
                 ^~~~~~~~~~~
main.py:20: error: List item 1 has incompatible type "str"; expected "IncludeEnum"  [list-item]
        include=["distances", "metadatas", "documents"],
                              ^~~~~~~~~~~
main.py:20: error: List item 2 has incompatible type "str"; expected "IncludeEnum"  [list-item]
        include=["distances", "metadatas", "documents"],
                                           ^~~~~~~~~~~
Found 3 errors in 1 file (checked 1 source file)
HammadB commented 1 month ago

Thats no good! Sorry. @tazarov can you take a look?

tazarov commented 1 month ago

@pmeier, You are correct that mypy complains about the newly added types

https://github.com/chroma-core/chroma/blob/4f68c60ec58a1355bc6c7797738e07bba85e5159/chromadb/api/types.py#L129-L140

This also affects get().

That said we do validate the the values of enum (and we even treat them as str. Strings are ok approach and we can make this still work with existing code by making Include a union type like so - Include = List[Union[IncludeEnum, str]]. Though I feel that in the future Enums might be a better approach.