deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
https://haystack.deepset.ai
Apache License 2.0
121 stars 119 forks source link

refactor: Remove check extension in every connection #1203

Closed wuqunfei closed 2 hours ago

wuqunfei commented 2 days ago

Related Issues

Create Extension need HIGH Permission in many cloud environment, like Azure https://learn.microsoft.com/en-us/azure/postgresql/extensions/how-to-allow-extensions?tabs=portal#create-extension

Proposed Changes:

Recommend to remove to check extension if exists pgvector in every connection.

  1. Operational Level like DBA can create Vector extension once
  2. Developer can connect it without considering this precondition.

    How did you test it?

manual tests

Notes for the reviewer

Checklist

anakin87 commented 4 hours ago

Hello and thanks for the proposal... This change breaks existing implementation, so I would like to explore a different direction:

What do you think? Does it make sense to you? (I haven't tried it)

wuqunfei commented 3 hours ago

thanks anakin for your suggestion, it is a good approach, but do you think it is a little heavy with extra logic like connection, cursor, fetchone in python.

def check_extension_exists(connection, extension_name):
    try:
        # Create a cursor to execute the query
        with connection.cursor() as cursor:
            # Query to check for the extension
            query = sql.SQL("SELECT EXISTS (SELECT 1 FROM pg_extension WHERE extname = %s);")
            cursor.execute(query, (extension_name,))
            # Fetch the result
            exists = cursor.fetchone()[0]
            return exists
    except Exception as e:
        print(f"Error checking extension: {e}")
        return False

I search for langchain, it looks same issue, https://github.com/langchain-ai/langchain/blob/4027da1b6e9c9f3934eb14bf25a362c719f6e9d8/libs/community/langchain_community/vectorstores/pgvector.py#L386

but they solved by adding a feature into construction function.

https://github.com/langchain-ai/langchain/blob/4027da1b6e9c9f3934eb14bf25a362c719f6e9d8/libs/community/langchain_community/vectorstores/pgvector.py#L309

    def __init__(
        self,
        connection_string: str,
        embedding_function: Embeddings,
        embedding_length: Optional[int] = None,
        collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
        collection_metadata: Optional[dict] = None,
        distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY,
        pre_delete_collection: bool = False,
        logger: Optional[logging.Logger] = None,
        relevance_score_fn: Optional[Callable[[float], float]] = None,
        *,
        connection: Optional[sqlalchemy.engine.Connection] = None,
        engine_args: Optional[dict[str, Any]] = None,
        use_jsonb: bool = False,
        create_extension: bool = True,
    ) -> None:
        """Initialize the PGVector store."""

How do you think ?

wuqunfei commented 1 hour ago

Because I use different github user, so I reset my repository, but PR closed. So I created another one here https://github.com/deepset-ai/haystack-core-integrations/pull/1213

Let us continue this topic