ClickHouse / clickhouse-connect

Python driver/sqlalchemy/superset connectors
Apache License 2.0
330 stars 64 forks source link

Return column structure in case of empty resultset #257

Open zerafachris opened 1 year ago

zerafachris commented 1 year ago

Describe the bug

Return column structure in case of empty resultset

Steps to reproduce

  1. Use the following query "SELECT 1 as 'column1' LIMIT 0"

Expected behaviour

Have a new setting, which will change behaviour in case of an empty result set being returned.

In case of pandas, driver.query_df(ch_query) should return pd.DataFrame({'column1' : [] })

This was resolved in the previous driver here https://github.com/ClickHouse/ClickHouse/issues/33522

genzgd commented 1 year ago

There is no easy way to do this using the HTTP interface, as ClickHouse doesn't return the column name or types in Native Format when the result set is empty. The driver would have to use a different format (such as JSONEachRow) for the query in order to be able to build the column structure. Note that the native interface used in clickhouse-driver works differently and does return an "empty block" with the column structure even for empty queries, so the implementation is more straightforward in that case.

If you know ahead of time that the query result will be empty (for example, with your LIMIT 0 example), maybe there could be something like a "structure only" query that would use the JSONEachRow format over HTTP.

zerafachris commented 1 year ago

Hi @genzgd , Thanks for the reply. Using query_arrow is am able to return the columns, whilst using 'query_df' the column set is empty. Maybe we can use a partial implementation from pyarrow to get the column lists?

I have tagged the part of code which is returning the empty dataframe and would require an update here https://github.com/ClickHouse/clickhouse-connect/pull/119

genzgd commented 1 year ago

Eventually it may make sense to switch the Pandas code to use Arrow instead of the ClickHouse Native format, in particular for Pandas 2.x, which is now backed by Arrow, but that's a pretty significant change. In the meantime I've opened a feature request https://github.com/ClickHouse/ClickHouse/issues/55939 that would make this possible without major code changes, but I don't know if and when the ClickHouse team will get to it.