Open Amphetaminewei opened 4 months ago
@Amphetaminewei, Can you tell me what Chroma version you are using? The warning log entry you encounter is triggered by add()
operation; it is expected to appear if the ID already exists. To avoid this, you can query for the ID get(ids=["my_tag_id"],include=[])
. As an alternative, you can use upsert()
, but that will update the record in Chroma, which means that your tag's embedding will be regenerated and inserted again (probably not optimal).
As a last resort, you can suppress the warning messages:
import logging
logger = logging.getLogger('chromadb.segment.impl.vector.local_hnsw')
logger1 = logging.getLogger('chromadb.segment.impl.metadata.sqlite')
logger.setLevel(logging.ERROR)
logger1.setLevel(logging.ERROR)
@Amphetaminewei, Can you tell me what Chroma version you are using?
Sorry I forgot to provide version information, the version of Chroma I'm using is 0.5.0. If I use get() or upsert(), will the performance be worse than if I used add() directly? Turning off the warning message is a dangerous act, and if there is no other way, we may have to keep the status quo.
hey @Amphetaminewei, let's examine the "costs" associated with each:
add()
add()
operation is a blocking one, meaning that it forces any concurrent queries/updates to the same collection to be serialized and waitget()
upsert()
upsert()
operation is a blocking one, meaning that it forces any concurrent queries/updates to the same collection to be serialized and waitOverall, I'd say the get()
(batch it over many IDs if possible) is a sensible approach, especially if you have many requests that end up in this situation. You can even cache things on the client side to avoid the roundtrip and the SQLite query altogether.
Another thing about logging is that you can rotate the logs thus keeping the size low.
@tazarov thank you, i think get()
and cache is a better choice to me, i'll give it a try in my program.
hey @tazarov , i tested the methods mentioned above and found a problem. I added a non-existent Id to collection, and still the log prompt "Add of existing embedding ID:", here's my program and output:
import chromadb
client = chromadb.PersistentClient(path="/home/wangweinan/.local/kylin-ai-business-framework/datamanagement/database/search")
collection = client.get_collection(name="files-tags")
# reply = collection.get()
# print(reply)
vector = [0] * 1024
collection.add(ids=["1000"], documents=["999"], metadatas=[{"tags": "999"}], embeddings=[vector])
output:
(python-env) wangweinan@wangweinan-xiaoxinpro14imh9:~/prj/test-onnxruntime$ python3 ./testchroma.py
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 9
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 1
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 1
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 1
Add of existing embedding ID: 27
Add of existing embedding ID: 68
Add of existing embedding ID: 88
Add of existing embedding ID: 3
Add of existing embedding ID: 4
Add of existing embedding ID: 6
Add of existing embedding ID: 7
Add of existing embedding ID: 8
Add of existing embedding ID: 84
Add of existing embedding ID: 9
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 20
Add of existing embedding ID: 21
Add of existing embedding ID: 22
Add of existing embedding ID: 9
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 42
Add of existing embedding ID: 43
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 46
Add of existing embedding ID: 47
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 60
Add of existing embedding ID: 61
Add of existing embedding ID: 1
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 71
Add of existing embedding ID: 72
Add of existing embedding ID: 73
Add of existing embedding ID: 74
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 1
Add of existing embedding ID: 80
Add of existing embedding ID: 81
Add of existing embedding ID: 82
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 93
Add of existing embedding ID: 95
Add of existing embedding ID: 1
Add of existing embedding ID: 68
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 10
Add of existing embedding ID: 11
Add of existing embedding ID: 12
Add of existing embedding ID: 13
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 9
Add of existing embedding ID: 84
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 129
Add of existing embedding ID: 130
Add of existing embedding ID: 131
Add of existing embedding ID: 132
Add of existing embedding ID: 133
Add of existing embedding ID: 134
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 36
Add of existing embedding ID: 149
Add of existing embedding ID: 150
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 42
Add of existing embedding ID: 43
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 46
Add of existing embedding ID: 47
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 30
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 54
Add of existing embedding ID: 55
Add of existing embedding ID: 56
Add of existing embedding ID: 57
Add of existing embedding ID: 60
Add of existing embedding ID: 61
Add of existing embedding ID: 164
Add of existing embedding ID: 165
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 70
Add of existing embedding ID: 75
Add of existing embedding ID: 76
Add of existing embedding ID: 78
Add of existing embedding ID: 1
Add of existing embedding ID: 80
Add of existing embedding ID: 82
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 200
Add of existing embedding ID: 201
Add of existing embedding ID: 93
Add of existing embedding ID: 95
Add of existing embedding ID: 99
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 3
Add of existing embedding ID: 10
Add of existing embedding ID: 11
Add of existing embedding ID: 12
Add of existing embedding ID: 13
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 9
Add of existing embedding ID: 84
Add of existing embedding ID: 9
Add of existing embedding ID: 26
Add of existing embedding ID: 84
Add of existing embedding ID: 27
Add of existing embedding ID: 129
Add of existing embedding ID: 130
Add of existing embedding ID: 131
Add of existing embedding ID: 132
Add of existing embedding ID: 133
Add of existing embedding ID: 134
Add of existing embedding ID: 135
Add of existing embedding ID: 137
Add of existing embedding ID: 138
Add of existing embedding ID: 242
Add of existing embedding ID: 30
Add of existing embedding ID: 31
Add of existing embedding ID: 32
Add of existing embedding ID: 33
Add of existing embedding ID: 34
Add of existing embedding ID: 35
Add of existing embedding ID: 36
Add of existing embedding ID: 60
Add of existing embedding ID: 149
Add of existing embedding ID: 150
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 42
Add of existing embedding ID: 43
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 46
Add of existing embedding ID: 47
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 28
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 282
Add of existing embedding ID: 283
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 70
Add of existing embedding ID: 71
Add of existing embedding ID: 72
Add of existing embedding ID: 73
Add of existing embedding ID: 75
Add of existing embedding ID: 1
Add of existing embedding ID: 288
Add of existing embedding ID: 80
Add of existing embedding ID: 82
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 200
Add of existing embedding ID: 201
Add of existing embedding ID: 93
Add of existing embedding ID: 95
Add of existing embedding ID: 3
Add of existing embedding ID: 4
Add of existing embedding ID: 6
Add of existing embedding ID: 7
Add of existing embedding ID: 8
Add of existing embedding ID: 10
Add of existing embedding ID: 11
Add of existing embedding ID: 12
Add of existing embedding ID: 13
Add of existing embedding ID: 14
Add of existing embedding ID: 17
Add of existing embedding ID: 18
Add of existing embedding ID: 9
Add of existing embedding ID: 84
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 27
Add of existing embedding ID: 129
Add of existing embedding ID: 130
Add of existing embedding ID: 131
Add of existing embedding ID: 132
Add of existing embedding ID: 133
Add of existing embedding ID: 134
Add of existing embedding ID: 135
Add of existing embedding ID: 137
Add of existing embedding ID: 138
Add of existing embedding ID: 341
Add of existing embedding ID: 30
Add of existing embedding ID: 36
Add of existing embedding ID: 28
Add of existing embedding ID: 30
Add of existing embedding ID: 50
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 54
Add of existing embedding ID: 55
Add of existing embedding ID: 56
Add of existing embedding ID: 57
Add of existing embedding ID: 60
Add of existing embedding ID: 61
Add of existing embedding ID: 1
Add of existing embedding ID: 164
Add of existing embedding ID: 270
Add of existing embedding ID: 271
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 371
Add of existing embedding ID: 372
Add of existing embedding ID: 281
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 70
Add of existing embedding ID: 71
Add of existing embedding ID: 72
Add of existing embedding ID: 73
Add of existing embedding ID: 74
Add of existing embedding ID: 284
Add of existing embedding ID: 78
Add of existing embedding ID: 1
Add of existing embedding ID: 288
Add of existing embedding ID: 80
Add of existing embedding ID: 82
Add of existing embedding ID: 200
Add of existing embedding ID: 201
Add of existing embedding ID: 93
Add of existing embedding ID: 203
Add of existing embedding ID: 1
Add of existing embedding ID: 88
Add of existing embedding ID: 1000
chroma version info is:
(python-env) wangweinan@wangweinan-xiaoxinpro14imh9:~/prj/test-onnxruntime$ pip3 show chromadb
Name: chromadb
Version: 0.5.0
Summary: Chroma.
Home-page:
Author:
Author-email: Jeff Huber <jeff@trychroma.com>, Anton Troynikov <anton@trychroma.com>
License:
Location: /usr/share/kylin-ai-business-framework/python-env/lib/python3.12/site-packages
Requires: bcrypt, build, chroma-hnswlib, fastapi, grpcio, importlib-resources, kubernetes, mmh3, numpy, onnxruntime, opentelemetry-api, opentelemetry-exporter-otlp-proto-grpc, opentelemetry-instrumentation-fastapi, opentelemetry-sdk, orjson, overrides, posthog, pydantic, pypika, PyYAML, requests, tenacity, tokenizers, tqdm, typer, typing-extensions, uvicorn
Required-by:
@Amphetaminewei, thanks for sharing. I'll have a look and share a sample of my suggestions for you to try out.
@tazarov , i simply modified my sample and found a similar situation:
import chromadb
client = chromadb.PersistentClient(path="/home/wangweinan/.local/kylin-ai-business-framework/datamanagement/database/search")
collection = client.get_collection(name="files-tags")
# reply = collection.get()
# print(reply)
vector = [0] * 1024
# collection.add(ids=["1111"], documents=["999"], metadatas=[{"tags": "999"}], embeddings=[vector])
collection.query(query_embeddings=[vector], n_results=10)
output:
(python-env) wangweinan@wangweinan-xiaoxinpro14imh9:~/prj/test-onnxruntime$ python3 ./testchroma.py
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 9
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 1
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 1
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 1
Add of existing embedding ID: 27
Add of existing embedding ID: 68
Add of existing embedding ID: 88
Add of existing embedding ID: 3
Add of existing embedding ID: 4
Add of existing embedding ID: 6
Add of existing embedding ID: 7
Add of existing embedding ID: 8
Add of existing embedding ID: 84
Add of existing embedding ID: 9
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 20
Add of existing embedding ID: 21
Add of existing embedding ID: 22
Add of existing embedding ID: 9
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 42
Add of existing embedding ID: 43
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 46
Add of existing embedding ID: 47
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 60
Add of existing embedding ID: 61
Add of existing embedding ID: 1
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 71
Add of existing embedding ID: 72
Add of existing embedding ID: 73
Add of existing embedding ID: 74
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 1
Add of existing embedding ID: 80
Add of existing embedding ID: 81
Add of existing embedding ID: 82
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 93
Add of existing embedding ID: 95
Add of existing embedding ID: 1
Add of existing embedding ID: 68
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 10
Add of existing embedding ID: 11
Add of existing embedding ID: 12
Add of existing embedding ID: 13
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 9
Add of existing embedding ID: 84
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 129
Add of existing embedding ID: 130
Add of existing embedding ID: 131
Add of existing embedding ID: 132
Add of existing embedding ID: 133
Add of existing embedding ID: 134
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 36
Add of existing embedding ID: 149
Add of existing embedding ID: 150
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 42
Add of existing embedding ID: 43
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 46
Add of existing embedding ID: 47
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 30
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 54
Add of existing embedding ID: 55
Add of existing embedding ID: 56
Add of existing embedding ID: 57
Add of existing embedding ID: 60
Add of existing embedding ID: 61
Add of existing embedding ID: 164
Add of existing embedding ID: 165
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 70
Add of existing embedding ID: 75
Add of existing embedding ID: 76
Add of existing embedding ID: 78
Add of existing embedding ID: 1
Add of existing embedding ID: 80
Add of existing embedding ID: 82
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 200
Add of existing embedding ID: 201
Add of existing embedding ID: 93
Add of existing embedding ID: 95
Add of existing embedding ID: 99
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 3
Add of existing embedding ID: 10
Add of existing embedding ID: 11
Add of existing embedding ID: 12
Add of existing embedding ID: 13
Add of existing embedding ID: 14
Add of existing embedding ID: 15
Add of existing embedding ID: 16
Add of existing embedding ID: 9
Add of existing embedding ID: 84
Add of existing embedding ID: 9
Add of existing embedding ID: 26
Add of existing embedding ID: 84
Add of existing embedding ID: 27
Add of existing embedding ID: 129
Add of existing embedding ID: 130
Add of existing embedding ID: 131
Add of existing embedding ID: 132
Add of existing embedding ID: 133
Add of existing embedding ID: 134
Add of existing embedding ID: 135
Add of existing embedding ID: 137
Add of existing embedding ID: 138
Add of existing embedding ID: 242
Add of existing embedding ID: 30
Add of existing embedding ID: 31
Add of existing embedding ID: 32
Add of existing embedding ID: 33
Add of existing embedding ID: 34
Add of existing embedding ID: 35
Add of existing embedding ID: 36
Add of existing embedding ID: 60
Add of existing embedding ID: 149
Add of existing embedding ID: 150
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 42
Add of existing embedding ID: 43
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 46
Add of existing embedding ID: 47
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 28
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 282
Add of existing embedding ID: 283
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 70
Add of existing embedding ID: 71
Add of existing embedding ID: 72
Add of existing embedding ID: 73
Add of existing embedding ID: 75
Add of existing embedding ID: 1
Add of existing embedding ID: 288
Add of existing embedding ID: 80
Add of existing embedding ID: 82
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 200
Add of existing embedding ID: 201
Add of existing embedding ID: 93
Add of existing embedding ID: 95
Add of existing embedding ID: 3
Add of existing embedding ID: 4
Add of existing embedding ID: 6
Add of existing embedding ID: 7
Add of existing embedding ID: 8
Add of existing embedding ID: 10
Add of existing embedding ID: 11
Add of existing embedding ID: 12
Add of existing embedding ID: 13
Add of existing embedding ID: 14
Add of existing embedding ID: 17
Add of existing embedding ID: 18
Add of existing embedding ID: 9
Add of existing embedding ID: 84
Add of existing embedding ID: 1
Add of existing embedding ID: 26
Add of existing embedding ID: 27
Add of existing embedding ID: 28
Add of existing embedding ID: 50
Add of existing embedding ID: 64
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 84
Add of existing embedding ID: 88
Add of existing embedding ID: 27
Add of existing embedding ID: 129
Add of existing embedding ID: 130
Add of existing embedding ID: 131
Add of existing embedding ID: 132
Add of existing embedding ID: 133
Add of existing embedding ID: 134
Add of existing embedding ID: 135
Add of existing embedding ID: 137
Add of existing embedding ID: 138
Add of existing embedding ID: 341
Add of existing embedding ID: 30
Add of existing embedding ID: 36
Add of existing embedding ID: 28
Add of existing embedding ID: 30
Add of existing embedding ID: 50
Add of existing embedding ID: 50
Add of existing embedding ID: 51
Add of existing embedding ID: 30
Add of existing embedding ID: 32
Add of existing embedding ID: 44
Add of existing embedding ID: 45
Add of existing embedding ID: 54
Add of existing embedding ID: 55
Add of existing embedding ID: 56
Add of existing embedding ID: 57
Add of existing embedding ID: 60
Add of existing embedding ID: 61
Add of existing embedding ID: 1
Add of existing embedding ID: 164
Add of existing embedding ID: 270
Add of existing embedding ID: 271
Add of existing embedding ID: 1
Add of existing embedding ID: 64
Add of existing embedding ID: 1
Add of existing embedding ID: 2
Add of existing embedding ID: 371
Add of existing embedding ID: 372
Add of existing embedding ID: 281
Add of existing embedding ID: 68
Add of existing embedding ID: 69
Add of existing embedding ID: 70
Add of existing embedding ID: 71
Add of existing embedding ID: 72
Add of existing embedding ID: 73
Add of existing embedding ID: 74
Add of existing embedding ID: 284
Add of existing embedding ID: 78
Add of existing embedding ID: 1
Add of existing embedding ID: 288
Add of existing embedding ID: 80
Add of existing embedding ID: 82
Add of existing embedding ID: 200
Add of existing embedding ID: 201
Add of existing embedding ID: 93
Add of existing embedding ID: 203
Add of existing embedding ID: 1
Add of existing embedding ID: 88
Add of existing embedding ID: 1000
I suspect this has something to do with the id I'm using, in other collections I'm using UUIDs and didn't find these issues. I would try more with this.
@Amphetaminewei, you are right that IDs must be unique. Using UUIDs will most certainly generate unique IDs, thus avoiding the warning message above.
Let me step back for a second and try to grasp your problem domain. Looking at this:
collection.add(ids=["1111"], documents=["999"], metadatas=[{"tags": "999"}], embeddings=[vector])
Are the following assumptions correct?:
metadatas=[{"tags": "999"}]
) to be unique? (by extension, the vector and the document are also unique, correct?)A clarifying question: Is your tag a single ID like "999" or can there be more (e.g. metadata=[{"tags":"999,1000..."}]
?
@tazarov , First of all, answer your questions
You don't care about the ID (which may be why you used UUID in the past)
Ever I used potentially duplicate IDs to avoid adding duplicate embedding, for embedding that couldn't be duplicated, I used UUID.
You want the tag "999" (metadatas=[{"tags": "999"}]) to be unique? (by extension, the vector and the document are also unique, correct?)
yep
Is your tag a single ID like "999" or can there be more (e.g. metadata=[{"tags":"999,1000..."}]?
I only have one metadatas=[{"tag_id":"999"}] for each embedding, in fact this id is added to avoid adding duplicate embeddings, I will get all the tag_id in the current collection via get() and check if there are already duplicate tag_id in the collection before investigating add().
I think the problem was that I wanted to avoid writing duplicate vectors by ID, and later I thought it would be better to use get() with metadata. What intrigues me is that in my case, "Add of existing embedding ID" appears in add() non-existent ID "999" and querying, and when querying the log does not point to the ID I queryed. Is this related to the caching of error logs?
I do not read all threads carefully. But I guess using a separate record manager is one solution to avoid duplicate adding. Langchain's indexing api is an example.
@Ao-Last, LC's indexing looks like an interesting proposition. As with everything, there are trade-offs:
After trying to understand the problem domain I feel this can be solved quite easily with a simple get()
prior to adding.
@Amphetaminewei, use fixed IDs to ensure you get a warning and the addition is ignored by Chroma, then do the following:
tag_to_add = "999"
results = collection.get(where={"tags": tag_to_add})
if len(results["ids"])==0:
collection.add(ids=[tag_to_add], documents=[tag_to_add], metadatas=[{"tags": tag_to_add}], embeddings=[vector])
get()
operation is relatively inexpensive and also quite fast.
I'm not using LC and adding more python packages would make my deployment more complicated. I think I know what to do, thank you @tazarov
In my scenario, I would try to extract the tags of a file and store the tag vector, for different files, the tags may be duplicated, and we don't want to save the duplicate tags. At present, we use the same ID for the same label to make the label vector not stored repeatedly, but when calling add, is often printed: Add of existing embedding ID: *. This log causes my log file to be very large and makes me wonder if my usage is wrong. Is there a better way to avoid storing duplicate vectors? Or is there a way to eliminate this log?