dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask
https://dask-sql.readthedocs.io/
MIT License
385 stars 71 forks source link

[ENH] Support additional table metadata syntax #1057

Open randerzander opened 1 year ago

randerzander commented 1 year ago

In trying to use Apache SuperSet's DataSets tab to explore existing tables over the Dask-SQL server, I hit what appear to be unsupported metadata queries:

image

For both options, root and information_schema for the "Schema" drop down, I get There was an error fetching tables.

The dask-sql server log:

TRACE:    127.0.0.1:59344 - ASGI [1870] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8080), 'client': ('127.0.0.1', 59344), 'scheme': 'http', 'method': 'POST', 'root_path': '', 'path': '/v1/statement', 'raw_path': b'/v1/statement', 'query_string': b'', 'headers': '<...>'}TRACE:    127.0.0.1:59344 - ASGI [1870] Receive {'type': 'http.request', 'body': '<23 bytes>', 'more_body': False}
RECIEVED: SHOW TABLES FROM "root"RESPONDING: <starlette.requests.Request object at 0x7f04f6855930>TRACE:    127.0.0.1:59344 - ASGI [1870] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}INFO:     127.0.0.1:59344 - "POST /v1/statement HTTP/1.1" 200 OKTRACE:    127.0.0.1:59344 - ASGI [1870] Send {'type': 'http.response.body', 'body': '<836 bytes>'}
TRACE:    127.0.0.1:59344 - ASGI [1870] CompletedTRACE:    127.0.0.1:59344 - HTTP connection lostTRACE:    127.0.0.1:59356 - HTTP connection made
TRACE:    127.0.0.1:59356 - ASGI [1871] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8080), 'client': ('127.0.0.1', 59356), 'scheme': 'http', 'method': 'GET', 'root_path': '', 'path': '/v1/status/55e32e28-4088-46c9-a337-59eb574a99e1', 'raw_path': b'/v1/status/55e32e28-4088-46c9-a337-59eb574a99e1', 'query_string': b'', 'headers': '<...>'}TRACE:    127.0.0.1:59356 - ASGI [1871] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}INFO:     127.0.0.1:59356 - "GET /v1/status/55e32e28-4088-46c9-a337-59eb574a99e1 HTTP/1.1" 200 OKTRACE:    127.0.0.1:59356 - ASGI [1871] Send {'type': 'http.response.body', 'body': '<746 bytes>'}TRACE:    127.0.0.1:59356 - ASGI [1871] CompletedTRACE:    127.0.0.1:59356 - HTTP connection lost
TRACE:    127.0.0.1:59360 - HTTP connection madeTRACE:    127.0.0.1:59360 - ASGI [1872] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8080), 'client': ('127.0.0.1', 59360), 'scheme': 'http', 'method': 'GET', 'root_path': '', 'path': '/v1/status/55e32e28-4088-46c9-a337-59eb574a99e1', 'raw_path': b'/v1/status/55e32e28-4088-46c9-a337-59eb574a99e1', 'query_string': b'', 'headers': '<...>'}TRACE:    127.0.0.1:59360 - ASGI [1872] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}INFO:     127.0.0.1:59360 - "GET /v1/status/55e32e28-4088-46c9-a337-59eb574a99e1 HTTP/1.1" 200 OKTRACE:    127.0.0.1:59360 - ASGI [1872] Send {'type': 'http.response.body', 'body': '<838 bytes>'}TRACE:    127.0.0.1:59360 - ASGI [1872] CompletedTRACE:    127.0.0.1:59360 - HTTP connection lost
TRACE:    127.0.0.1:59366 - HTTP connection madeTRACE:    127.0.0.1:59366 - ASGI [1873] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8080), 'client': ('127.0.0.1', 59366), 'scheme': 'http', 'method': 'POST', 'root_path': '', 'path': '/v1/statement', 'raw_path': b'/v1/statement', 'query_string': b'', 'headers': '<...>'}
TRACE:    127.0.0.1:59366 - ASGI [1873] Receive {'type': 'http.request', 'body': '<100 bytes>', 'more_body': False}
RECIEVED: SELECT table_name FROM information_schema.tablesWHERE table_schema = 'root'AND table_type = 'VIEW'TRACE:    127.0.0.1:59366 - ASGI [1873] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}
INFO:     127.0.0.1:59366 - "POST /v1/statement HTTP/1.1" 200 OK

It looks like the server actually sends a response for the "root" schema, but is not formatted as SuperSet expects. For the "information_schema" schema, SuperSet has a "Table loading" spinner for a long time before timing out, which makes me suspect the server is silently dropping the query.

randerzander commented 1 year ago

cc @jdye64 who I recall doing some work w/ information_schema directly in DataFusion

jdye64 commented 1 year ago

I don't think that today we support this. There is a configuration option inside of DataFusion that allows for the generation of the "information_schema" tables. I think we need some work on our side to make sure that our table registering methods also insert that data into the "information_schema"

Does that make sense?