chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.66k stars 1.32k forks source link

[Feature Request]: Add close() function to chromadb.HttpClient #1865

Open hamasurrehman opened 8 months ago

hamasurrehman commented 8 months ago

Describe the problem

Currently, the chromadb.HttpClient class lacks a method for explicitly closing the connection. This omission poses challenges for users in managing resources efficiently and ensuring proper cleanup after using the client. Without a designated way to close connections, users may encounter resource leaks or inefficient resource utilization, particularly in scenarios involving long-lived applications or numerous concurrent connections.

Describe the proposed solution

I propose augmenting the chromadb.HttpClient class with a close() function to facilitate explicit closure of connections. This method would empower users to responsibly manage resources by enabling them to explicitly release connections when they are no longer needed. By integrating a close() function, users can uphold best practices for resource management in Python and mitigate potential issues associated with lingering connections.

Alternatives considered

  1. Automatic Connection Management: Rather than introducing a close() function, an alternative approach could involve implementing automatic connection management within the chromadb.HttpClient. This would entail the client automatically closing connections after a certain period of inactivity or when they reach a predefined threshold. However, this approach might lack flexibility and could potentially lead to unexpected connection closures, especially in scenarios where users require precise control over connection lifetimes.

  2. Context Manager Support: Another option could be to add support for the context manager protocol (__enter__ and __exit__ methods) to the chromadb.HttpClient. By implementing this protocol, users could utilize the with statement to ensure proper resource cleanup, as connections would be automatically closed upon exiting the with block. While this approach offers convenience, it may not fully address cases where users need to explicitly manage connection lifetimes outside of a with context.

  3. Manual Connection Management: Alternatively, users could manually manage connection lifetimes by explicitly calling a disconnect() or release() method on the chromadb.HttpClient to close connections. While this approach provides control over resource cleanup, it relies heavily on user diligence and may increase the risk of overlooking necessary cleanup steps, leading to potential resource leaks or inefficiencies.

Importance

i cannot use Chroma without it

Additional Information

This feature would enhance the usability and robustness of the ChromaDB client library, providing users with more control over resource management. It would also contribute to better adherence to Pythonic conventions and standards for database client libraries.

tazarov commented 8 months ago

@hamasurrehman, thank you for the elaborate description of your problem and possible alternatives. We do have a PR on this #1792. I would appreciate your feedback on it.

beggers commented 8 months ago

I'm a little confused about the use-case here. I see two possible ways for this to be used:

I don't see much utility in the second case: when the object gets garbage collected it'll close all connections, and that happens relatively quickly.

The first case is interesting, though. Chroma server currently defaults to Http/1.1 which has a default keepalive of 30 seconds. This means if you're creating a lot of HttpClients which only send a few requests each you do end up with a lot of port usage hanging around. We could plumb in a close() method to close all connections in the client's threadpool. Is this what you meant @hamasurrehman? Could you describe the resource limitations you're bumping up against?

tazarov commented 8 months ago

@beggers, how about proxies? Can they keep the connections open?

A user in Discord reported this:

tcp        1      0 testing00-alfredg:36892 staging-chromadb.i:8002 CLOSE_WAIT  off (0.00/0/0)
tcp        1      0 testing00-alfredg:36412 staging-chromadb.i:8002 CLOSE_WAIT  off (0.00/0/0)

I think calling session.close() could be an effective way for people to indicate that they want to release resources properly.

On top of that, we can provide a utility context manager.