Open beggers opened 1 year ago
Thought
Do we want it on local calls also, or just remote ones?
@beggers and @HammadB, does it make sense to make this configurable at client-level? I think we can have one global timeout for all requests but also sub-timeouts for reads (query/get) and other ops (create/update/...)
Also, we should warn the user that timeouts can be affected by proxies and other networking outside of Chroma, and they should adjust their configuration to accommodate that.
Regarding adding a timeout on query alone, what is the use case where one might want to adjust query timeouts dynamically? (large vs small queries).
Hello @beggers , I am Pragnesh, and want to contribute to Chroma, I want to solve this issue, you may guide me. I propose creation of
Client
class
_query_with_timeout()
_query_without_timeout()
which will be same as current implementation of _query()
_query()
a new implemenation I intend the _query_with_timeout()
method to be
def _query_with_timeout(self,
collection_id: UUID,
query_embeddings: Embeddings,
n_results: int = 10,
where: Where = {},
where_document: WhereDocument = {},
include: Include = ["embeddings", "metadatas", "documents", "distances"],
timeout: float = 1000
) -> QueryResult | None :
timeout_seconds = timeout / 1000 #convert milliseconds to secs
resolution_event = threading.Event()
query_result = None
def query_wrapper():
try :
query_result =self._query_without_timeout(...)
except TimeoutError :
pass
finally :
resolution_event.set()
query_thread = threading.Thread(target=query_wrapper)
query_thread.start()
query_thread.join(timeout = timeout_seconds)
if query_thread.is_alive() :
query_thread.join()
raise TimeoutError("Query timed out")
resolution_event.wait()
return query_result
I intend the new _query()
method to be
@override
def _query(self,
collection_id: UUID,
query_embeddings: Embeddings,
n_results: int = 10,
where: Where = {},
where_document: WhereDocument = {},
include: Include = ["embeddings", "metadatas", "documents", "distances"],
timeout: float | None = None
) -> QueryResult | None:
if timeout :
return self._query_with_timeout(collection_id=collection_id,
query_embeddings=query_embeddings,
n_results=n_results,
where=where,
where_document=where_document,
include=include,
timeout=timeout)
return self._query_without_timeout(collection_id=collection_id,
query_embeddings=query_embeddings,
n_results=n_results,
where=where,
where_document=where_document,
include=include,)
This would result in changing the query()
method in Collection.py
to accept timeout as optional argument.
If seems ok to you, I will move on to write a CIP.
We could have also used
signal
library in python to handle timeout, but that would not haved resulted in platform agnostic code (found out we need different code windows and linux), that's why i went with threading library.
Describe the problem
Discord user Aaron Ng has requested a way to specify timeouts in queries. This seems well scoped and nice to have.
Marking this as a good first issue, but contributors please note: this will require some thought and a couple PRs.
Describe the proposed solution
Write a CIP specifying the proposed solution (API changes), get it approved by the core chroma team, then implement it.
We probably just want to accept an optional kwarg in the client's
collection.query()
.Alternatives considered
No response
Importance
nice to have
Additional Information
No response