dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask
https://dask-sql.readthedocs.io/
MIT License
385 stars 71 forks source link

[ENH] Support "show tables" metadata query #367

Closed randerzander closed 2 years ago

randerzander commented 2 years ago

It's common to create tables programmatically using names derived from filenames or other non-static source.

It would be nice to support "SHOW TABLES" (from Hive & Spark) as a way to query Dask-SQL for the list of tables in the default database:

import pandas as pd
from dask_sql import Context

df = pd.DataFrame({"id": [0, 1]})
c.create_table('test', df)

c.sql("show tables").compute()

Result:

---------------------------------------------------------------------------
ParsingException                          Traceback (most recent call last)
/tmp/ipykernel_2098103/3923598762.py in <module>
      5 c.create_table('test', df)
      6 
----> 7 c.sql("show tables").compute()

~/conda/envs/dsql-1-13/lib/python3.7/site-packages/dask_sql/context.py in sql(self, sql, return_futures, dataframes)
    453                 self.create_table(df_name, df)
    454 
--> 455         rel, select_names, _ = self._get_ral(sql)
    456 
    457         dc = RelConverter.convert(rel, context=self)

~/conda/envs/dsql-1-13/lib/python3.7/site-packages/dask_sql/context.py in _get_ral(self, sql)
    882             # if debug is not set.
    883             # Instead, we raise a nice exception
--> 884             raise ParsingException(sql, str(e.message())) from None
    885 
    886         # Internal, temporary results of calcite are sometimes

ParsingException: Can not parse the given SQL: Encountered "<EOF>" at line 1, column 11.
Was expecting:
    "FROM" ...

The problem is probably somewhere here:

    show tables
              ^

Edit: Now I see the doc page shows a way to achieve this. It'd still be nice for Dask-SQL to "USE root" by default, and have "show tables" list the tables in that schema.

randerzander commented 2 years ago

cc @ayushdg as a possible good first Calcite issue for you to work on

charlesbluca commented 2 years ago

We currently support SHOW TABLES FROM:

import pandas as pd
from dask_sql import Context

c = Context()

df = pd.DataFrame({"id": [0, 1]})
c.create_table('test', df)

c.sql('SHOW TABLES FROM "root"').compute()
  Table
0  test

Are you looking specifically for support for SHOW TABLES alone?

randerzander commented 2 years ago

Are you looking specifically for support for SHOW TABLES alone?

Thanks for the quick response. I edited the issue description above to make clearer I was looking for "SHOW TABLES" in the context of the current database/schema.