chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.18k stars 1.27k forks source link

[Install issue]: Getting Null in Query #2914

Open juned-adenwalla opened 3 weeks ago

juned-adenwalla commented 3 weeks ago

What happened?

So i am trying to run the below code to query data I do get results but other fields like metadata, embeddings is giving null

Code to query :

const result = await collection.query({
                queryEmbeddings: trainedData,
                nResults: 1
            });

Code to add :

await dataCollection.add({
                    ids: documentId,  // this should be a string or an array of strings
                    embeddings: embedding.data[0].embedding,  // this should be an array of numbers (the embedding)
                    metadata: {
                        type: 'Support',
                        data: 'refinedData',  // this can be any value, like a string, object, etc.
                    },
                });

Output :

{
    "data": {
        "ids": [
            [
                "9"
            ]
        ],
        "distances": [
            [
                0.24018067121505737
            ]
        ],
        "metadatas": [
            [
                null
            ]
        ],
        "embeddings": null,
        "documents": [
            [
                null
            ]
        ],
        "uris": null,
        "data": null,
        "included": [
            "metadatas",
            "documents",
            "distances"
        ]
    },
    "status": true
}

Versions

Latest

Relevant log output

No response

tazarov commented 3 weeks ago

hey @juned-adenwalla, thanks for submitting this. I was able to reproduce your issue using the examples you provided above.

Quick reproduction steps:

Start Chroma server:

docker run --rm -p 8000:8000 chromadb/chroma:0.5.12

Test rig:

const {ChromaClient, OpenAIEmbeddingFunction} = require("chromadb");
const {v4: uuidv4} = require('uuid');
(async () => {
    const cemb = new OpenAIEmbeddingFunction({
        openai_model: "text-embedding-3-small",
        openai_api_key: "sk-"
    });

    const client = new ChromaClient({
        path: "http://0.0.0.0:8000",
    });
    const col = await client.getOrCreateCollection({name: "a-test-collection", embeddingFunction: cemb});
    const embeddings = await cemb.generate(["The powerhouse of the cell is the mitochondria"])
    await col.add({
        ids: "9",
        embeddings: embeddings[0],
        metadata: {
            type: 'Support',
            data: 'refinedData', 
        },
    });

    const result = await col.query({
        queryEmbeddings: embeddings[0],
        nResults: 1
    });
    console.log(JSON.stringify(result));
    await client.deleteCollection({name: "a-test-collection"});
})();

Results in:

{"ids":[["9"]],"distances":[[2.9883324390277763e-16]],"metadatas":[[null]],"embeddings":null,"documents":[[null]],"uris":null,"data":null,"included":["metadatas","documents","distances"]}

There are two issues here:

  1. The syntax of your add is wrong:
--- error.js    2024-10-10 13:36:39
+++ fixed.js    2024-10-10 13:36:51
@@ -1,8 +1,8 @@
 await col.add({
-        ids: "9", 
-        embeddings: embeddings[0],
-        metadata: {
+        ids: ["9"],
+        embeddings: [embeddings[0]],
+        metadatas: [{
             type: 'Support',
             data: 'refinedData', 
-        },
+        }],
     });

Check the docs here - https://docs.trychroma.com/reference/js-collection#add

  1. The JS client does not do sufficient validations on its inputs.

Here's a working version of the example above:

const {ChromaClient, OpenAIEmbeddingFunction} = require("chromadb");
(async () => {
    const cemb = new OpenAIEmbeddingFunction({
        openai_model: "text-embedding-3-small",
        openai_api_key: "sk-"
    });

    const client = new ChromaClient({
        path: "http://0.0.0.0:8000",
    });
    const col = await client.getOrCreateCollection({name: "a-test-collection", embeddingFunction: cemb});
    const embeddings = await cemb.generate(["The powerhouse of the cell is the mitochondria"])
    await col.add({
        ids: ["9"], 
        embeddings: [embeddings[0]],
        metadatas: [{
            type: 'Support',
            data: 'refinedData',
        }],
    });

    const result = await col.query({
        queryEmbeddings: embeddings[0],
        nResults: 1
    });
    console.log(JSON.stringify(result));
})();