GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.29k stars 308 forks source link

Python SQLAlchemy Inspector Error ( MySQL Dialect ) with greptime - TypeError: NullType() takes no arguments #4537

Closed atul-r closed 2 weeks ago

atul-r commented 2 months ago

What type of bug is this?

Incorrect result

What subsystems are affected?

Standalone mode

Minimal reproduce step

I am encountering an issue when using SQLAlchemy's inspector to retrieve column details from a Greptime database using the MySQL dialect. Below is the code snippet I am using:

from sqlalchemy import create_engine, text, inspect

conn_string = "mysql://root:password@127.0.0.1:4002/somedb"

engine = create_engine(conn_string)

inspector = inspect(engine)
columns = inspector.get_columns('app_logs', 'somedb')
for column in columns:
    print(column)

When I run this code, I receive the following error:

TypeError: NullType() takes no arguments

Analysis: During reflection, SQLAlchemy issues a SHOW CREATE TABLE query to retrieve the table schema. It then parses the returned string using regex to extract the table name and column details.

Greptime returns the following CREATE TABLE statement for the table:

CREATE TABLE IF NOT EXISTS `app_logs` (
  `ts` TIMESTAMP(3) NOT NULL,
  `host` STRING NULL,
  `api_path` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'),
  `log_level` STRING NULL,
  `log` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'),
  TIME INDEX (`ts`),
  PRIMARY KEY (`host`, `log_level`)
)
ENGINE=mito
WITH(
  append_mode = 'true'
)

The issue seems to stem from the data types being returned in uppercase (e.g., STRING, TIMESTAMP) also some data types are different for example STRING, there is no data type of STRING in mysql. The MySQL dialect in SQLAlchemy does not recognize these uppercase data types and expects them in lowercase. As a result, it raises a TypeError.

I manually changed the data types to lowercase during debugging, and the reflection worked without any issues. This suggests that the data type casing is the root cause of the problem.

Request: Is there a way to modify Greptime so that it returns the data types in lowercase, or provide an output consistent with what the MySQL dialect in SQLAlchemy expects? This would enhance compatibility and prevent errors during schema reflection.

What did you expect to see?

{'name': 'ts', 'type': TIMESTAMP(fsp=3), 'default': None, 'comment': None, 'nullable': False}
{'name': 'host', 'type': VARCHAR(), 'default': None, 'comment': None, 'nullable': True}
{'name': 'api_path', 'type': VARCHAR(), 'default': None, 'comment': None, 'nullable': True}
{'name': 'log_level', 'type': VARCHAR(), 'default': None, 'comment': None, 'nullable': True}
{'name': 'log', 'type': VARCHAR(), 'default': None, 'comment': None, 'nullable': True}

What did you see instead?

TypeError: NullType() takes no arguments

What operating system did you use?

greptime/greptimedb:v0.9.1,

What version of GreptimeDB did you use?

0.9.1

Relevant log output and stack trace

c:\Users\user\Documents\project\others\alchemy-greptime\app.py:43: SAWarning: Did not recognize type 'TIMESTAMP' of column 'ts'
  columns = inspector.get_columns('app_logs', 'supersetexample')
Traceback (most recent call last):
  File "c:\Users\user\Documents\project\others\alchemy-greptime\app.py", line 43, in <module>
    columns = inspector.get_columns('app_logs', 'supersetexample')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\engine\reflection.py", line 859, in get_columns
    col_defs = self.dialect.get_columns(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 2, in get_columns
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\engine\reflection.py", line 97, in cache
    ret = fn(self, con, *args, **kw)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\dialects\mysql\base.py", line 2966, in get_columns
    parsed_state = self._parsed_state_or_create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\dialects\mysql\base.py", line 3226, in _parsed_state_or_create
    return self._setup_parser(
           ^^^^^^^^^^^^^^^^^^^
  File "<string>", line 2, in _setup_parser
  File "<string>", line 2, in _setup_parser
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\engine\reflection.py", line 97, in cache     
    ret = fn(self, con, *args, **kw)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\dialects\mysql\base.py", line 3262, in _setup_parser     _parser
    return parser.parse(sql, charset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                      rse
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\dialects\mysql\reflection.py", line 48, in parse                                                                                                                                                                        parse_column
    self._parse_column(line, state)
  File "C:\Users\user\AppData\Local\pypoetry\Cache\virtualenvs\alchemy-greptime-7gNpWF2k-py3.11\Lib\site-packages\sqlalchemy\dialects\mysql\reflection.py", line 284, in _parse_column
    type_instance = col_type(*type_args, **type_kw)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: NullType() takes no arguments
evenyag commented 2 months ago

Maybe we can create a PR to address this in SQLAlchemy. I found that the PostgreSQL dialect lowers the string before getting the schema. https://github.com/sqlalchemy/sqlalchemy/blob/6cf5e2a188fc5e337d22a098a5fe9a9fe10cc7e7/lib/sqlalchemy/dialects/postgresql/base.py#L3756

I'm also considering if we can provide a way to display the data type in lowercase.

atul-r commented 2 months ago

Maybe we can create a PR to address this in SQLAlchemy. I found that the PostgreSQL dialect lowers the string before getting the schema. https://github.com/sqlalchemy/sqlalchemy/blob/6cf5e2a188fc5e337d22a098a5fe9a9fe10cc7e7/lib/sqlalchemy/dialects/postgresql/base.py#L3756

I'm also considering if we can provide a way to display the data type in lowercase.

@evenyag But there are non matching data types as well. For example STRING, there is no data type of STRING in MySQL. possibly VARCHAR. This needs to come from greptime

v0y4g3r commented 2 months ago

Maybe we can create a PR to address this in SQLAlchemy. I found that the PostgreSQL dialect lowers the string before getting the schema. https://github.com/sqlalchemy/sqlalchemy/blob/6cf5e2a188fc5e337d22a098a5fe9a9fe10cc7e7/lib/sqlalchemy/dialects/postgresql/base.py#L3756 I'm also considering if we can provide a way to display the data type in lowercase.

But there are non matching data types as well. For example STRING, there is no data type of STRING in MySQL. possibly VARCHAR. This needs to come from greptime

Since PostgreSQL does not support SHOW statements, we can format the output to match MySQL

evenyag commented 2 months ago

Can we add a new method display_mysql_dialect() to the CreateTable and Column? https://github.com/GreptimeTeam/greptimedb/blob/27d9aa0f3be492e4f6e420876193b29f937f6157/src/sql/src/statements/create.rs#L235-L246

We need to map some data types to MySQL's types in Column. https://github.com/GreptimeTeam/greptimedb/blob/27d9aa0f3be492e4f6e420876193b29f937f6157/src/sql/src/statements/create.rs#L135-L148

Another way is to map the data type while creating the Column struct. https://github.com/GreptimeTeam/greptimedb/blob/27d9aa0f3be492e4f6e420876193b29f937f6157/src/query/src/sql/show_create_table.rs#L116-L119

sunng87 commented 2 months ago

@atul-r I'm not quite familiar with SQLAlchemy, I wonder if this inspect function a must-have to use SQLAlchemy with GreptimeDB? Is this error blocking you from running SQL query in SQLAlchemy?

atul-r commented 2 months ago

@atul-r I'm not quite familiar with SQLAlchemy, I wonder if this inspect function a must-have to use SQLAlchemy with GreptimeDB? Is this error blocking you from running SQL query in SQLAlchemy?

@sunng87 I am using apache superset to create charts from the data in greptime. Superset uses SQLAlchemy to connect to greptime. When connecting to any database, superset uses the inspect call to get the table details. This is where I am getting the issue. So it is a blocker

sunng87 commented 2 months ago

@atul-r Thank you for the information. That's a blocking issue for Superset.

atul-r commented 2 months ago

@atul-r Thank you for the information. That's a blocking issue for Superset.

The MySQL endpoint of greptime should send responses that is inline with MySQL. This needs to be handled in greptime. If not any MySQL client will face issues with greptime

There is another issue that I have created as well (https://github.com/GreptimeTeam/greptimedb/issues/4542)

sunng87 commented 2 months ago

Yes. That's we need to address in GreptimeDB for support of Superset.

evenyag commented 2 months ago

For #4542, most date/time functions should have an alternative in datafusion, which is compatible with PostgreSQL. https://datafusion.apache.org/user-guide/sql/scalar_functions.html#time-and-date-functions

Will it be easier to support PostgreSQL's driver than MySQL? We are working on https://github.com/GreptimeTeam/greptimedb/issues/3560

What do you think? @sunng87

sunng87 commented 2 months ago

@evenyag sounds good. I will give Superset's postgres datasource a try to see the gap.

sunng87 commented 2 months ago

I tried postgresql data source. It seems Superset doesn't offer configuration for psycopg's autocommit option so the postgres client will send statements like BEGIN which is not supported by us.

sunng87 commented 2 months ago

After a deep dive into postgres superset connector, even with #3560 and #4553 , there are still some functions like pg_catalog.pg_get_serial_sequence, json_build_object and etc. to be supported. We don't see this in near future.

Perhaps we need to build our own plugin or connector for superset, as well as metabase.

sunng87 commented 1 month ago

@atul-r I just created a greptimedb connector for sqlalchemy and superset. You can add greptimedb-sqlalchemy to docker/requirements-local.txt (if you are using docker-compose) or use pip install greptimedb-sqlalchemy if you installed superset in your system.

There will be a GreptimeDB option when you are adding database connection.

This is an early release of the greptimedb native connector. Let me know if you find any issue with it.

atul-r commented 1 month ago

@atul-r I just created a greptimedb connector for sqlalchemy and superset. You can add greptimedb-sqlalchemy to docker/requirements-local.txt (if you are using docker-compose) or use pip install greptimedb-sqlalchemy if you installed superset in your system.

There will be a GreptimeDB option when you are adding database connection.

This is an early release of the greptimedb native connector. Let me know if you find any issue with it.

@sunng87 I am getting the below error

ERROR: (builtins.NoneType) None
[SQL: Failed to plan SQL: Error during planning: Invalid function 'pg_catalog.version'.
Did you mean 'APPROX_MEDIAN'?
]
(Background on this error at: https://sqlalche.me/e/14/dbapi)

My connection string is greptimedb://root:greptime_pwd@host.docker.internal:4003/supersetexample

Created a detailed issue here

sunng87 commented 2 weeks ago

This has been resolved by our superset driver, see https://github.com/sunng87/greptimedb-sqlalchemy/issues/1 and our docs https://docs.greptime.com/nightly/user-guide/integrations/superset