GlareDB / glaredb

GlareDB: An analytics DBMS for distributed data
https://glaredb.com
GNU Affero General Public License v3.0
650 stars 38 forks source link

Use glaredb with dbt/great expectations: fix postgres functionality #2436

Closed talagluck closed 7 months ago

talagluck commented 8 months ago

Description

When attempting to create a connection from dbt to a glaredb cloud instance using a postgresql connection string, I hit the following error:

14:37:27  1 check failed:
14:37:27  dbt was unable to connect to the specified database.
The database returned the following error:

  >Database Error
  error with status PGRES_EMPTY_QUERY and no message from the libpq

Check your database credentials and try again. For more information, visit:
https://docs.getdbt.com/docs/configure-your-profile

Given that dbt should work fine with a postgres connector, I would expect this to work, but there's a mismatch here. Sean mentioned that he tested this a few months back and it worked, so there may have been a regression here.

I also get the same error when attempting to run Great Expectations with GlareDB - Great Expectations uses SqlAlchemy under the surface.

talagluck commented 8 months ago

Steps to reproduce for Great Expectations:

pip install "great_expectations[postgresql]"

In a notebook (or python script) run:

import great_expectations as gx
context = gx.get_context() # gets a great expectations project context 
ds = context.sources.add_postgres(name="glaredb", connection_string=<CONNECTION_STRING>)

The connection string I used was "postgresql://6AhiEN7GQDmo:<PASSWORD>@o_PRocU0j.proxy.glaredb.com:6543/rough_glitter"

tychoish commented 8 months ago

I'm pretty sure that this is #2273, and if you can you specify autocommit=True in some way that it gets set on the psycopg2 (I found some docs but I don't know the best way here.) and see if it helps, that'd be cool.

scsmithr commented 8 months ago

Good news, I think I got further (thanks to correcting the transaction stubs and connecting to qa). Bad news, seems great expectation is expecting a pg_catalog.version() function:

~/Code/github.com/glaredb/glaredb/bindings/python [1] % just example ge
../../.venv/bin/python examples/ge.py
Traceback (most recent call last):
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.InternalError_: Error during planning: Invalid function 'pg_catalog.version'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/great_expectations/datasource/fluent/sql_datasource.py", line 1143, in test_connection
    engine.connect()
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3325, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3404, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect
    return fn()
           ^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 327, in connect
    return _ConnectionFairy._checkout(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
    rec = pool._do_get()
          ^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/impl.py", line 145, in _do_get
    with util.safe_reraise():
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
    return self._create_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 273, in _create_connection
    return _ConnectionRecord(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 388, in __init__
    self.__connect()
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 704, in __connect
    )._exec_w_sync_on_first_run(self.dbapi_connection, self)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/event/attr.py", line 320, in _exec_w_sync_on_first_run
    self(*args, **kw)
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/event/attr.py", line 334, in __call__
    fn(*args, **kw)
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 1695, in go
    return once_fn(*arg, **kw)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/create.py", line 670, in first_connect
    dialect.initialize(c)
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 793, in initialize
    super(PGDialect_psycopg2, self).initialize(connection)
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/base.py", line 3441, in initialize
    super(PGDialect, self).initialize(connection)
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 408, in initialize
    self.server_version_info = self._get_server_version_info(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/base.py", line 3687, in _get_server_version_info
    v = connection.exec_driver_sql("select pg_catalog.version()").scalar()
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1770, in exec_driver_sql
    return self._exec_driver_sql(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1674, in _exec_driver_sql
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.InternalError: (psycopg2.errors.InternalError_) Error during planning: Invalid function 'pg_catalog.version'.

[SQL: select pg_catalog.version()]
(Background on this error at: https://sqlalche.me/e/14/2j85)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/sean/Code/github.com/glaredb/glaredb/bindings/python/examples/ge.py", line 4, in <module>
    ds = context.sources.add_postgres(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/great_expectations/datasource/fluent/sources.py", line 472, in add_datasource
    datasource.test_connection()
  File "/Users/sean/Code/github.com/glaredb/glaredb/.venv/lib/python3.11/site-packages/great_expectations/datasource/fluent/sql_datasource.py", line 1145, in test_connection
    raise TestConnectionError(
great_expectations.datasource.fluent.interfaces.TestConnectionError: Attempt to connect to datasource failed with the following error message: (psycopg2.errors.InternalError_) Error during planning: Invalid function 'pg_catalog.version'.

[SQL: select pg_catalog.version()]
(Background on this error at: https://sqlalche.me/e/14/2j85)
error: Recipe `example` failed on line 29 with exit code 1

But this seems easy enough to add in.

talagluck commented 8 months ago

Cool! Yes, it can be kind of challenging to pass in arguments to Great Expectations that it isn't expecting. But my hunch is something like what Sam said, that there is an issue with psycopg2 or SQL Alchemy, which is why this isn't working with a few different tools.

universalmind303 commented 8 months ago

@scsmithr Is the next logical step for this then to implement pg_catalog.version()

scsmithr commented 8 months ago

I added this last week, and a test to confirm: https://github.com/GlareDB/glaredb/blob/main/testdata/sqllogictests/functions/version.slt#L6-L14

That combined with the pg_tables thing I think will get us further here. I think we can retest (w/ qa) and see what happens.

talagluck commented 8 months ago

I hit a new issue now with dbt:

Database Error
  Error during planning: Unable to fetch table provider for 'pg_matviews': failed to resolve: failed to find table: pg_matviews
tychoish commented 8 months ago

Just to double check, it would be good to verify if there are other issues, and understand where we're at on this. (@talagluck)

It might also be good, to see what the script is that you're using to test this, then we can cut out the longer iteration cycle.

talagluck commented 8 months ago

On the latest test, I get:

Database Error
  Error during planning: Unable to fetch table provider for 'pg_rewrite': failed to resolve: failed to find table: pg_rewrite

I've been working on a script today. It's a bit more involved than I expected, but I'll have something soon.

talagluck commented 8 months ago

PR with test is here: #2580

greyscaled commented 7 months ago

@talagluck is this closable now, or what's the tl:dr status update?

talagluck commented 7 months ago

Yup! I'll close this and open up more specific issues as needed.