Print result type information inspired by DuckDB DESCRIBE

apstndb commented 1 month ago

DuckDB has nice features about result set type.

Type names in table header.

D SELECT (1, 'foo')::STRUCT(i INTEGER, c VARCHAR) AS str LIMIT 0;
┌──────────────────────────────┐
│             str              │
│ struct(i integer, c varchar) │
├──────────────────────────────┤
│            0 rows            │
└──────────────────────────────┘

`DESCRIBE query` prints result set shape.

https://duckdb.org/docs/guides/meta/describe.html#describing-a-query

D DESCRIBE SELECT (1, 'foo')::STRUCT(i INTEGER, c VARCHAR) AS str LIMIT 0;
┌─────────────┬──────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │         column_type          │  null   │   key   │ default │  extra  │
│   varchar   │           varchar            │ varchar │ varchar │ varchar │ varchar │
├─────────────┼──────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ str         │ STRUCT(i INTEGER, c VARCHAR) │ YES     │         │         │         │
└─────────────┴──────────────────────────────┴─────────┴─────────┴─────────┴─────────┘

Are they useful and compatible in spanner-cli?

Type names in table header

I think it is very useful because it can achieve good balance of printing type informations and keep output to simple.

I had often tried to print correct type information in spanner-cli, but it was too verbose.

https://github.com/cloudspannerecosystem/spanner-cli/issues/104 https://github.com/cloudspannerecosystem/spanner-cli/issues/174

https://github.com/olekukonko/tablewriter supports multi-line header, so it can be implemented using it.

spanner> SELECT 1 AS i, "foo" AS s;
+-------+--------+
| i     | s      |
| INT64 | STRING |
+-------+--------+
| 1     | foo    |
+-------+--------+
1 rows in set (2.35 msecs)

`DESCRIBE` command

ResultSet.metadata.rowType will be returned even if QueryMode=PLAN. It will be useful to check result set shape of queries without execution, especially query with query parameters.

DESCRIBE (and DESC) in spanner-cli is not documented synonym of EXPLAIN, so I think users will use EXPLAIN if meaning of DESCRIBE will be changed.

apstndb commented 1 month ago

I have implemented the prototype in #191

yfuruyama commented 1 month ago

Thank you for filing this. I basically agree with your proposition and have some comments.

Showing the type in each column by default looks too verbose. Maybe it's better to show it optionally, for example when --verbose option is used.
I think changing the semantic of DESCRIBE command would be safe. I guess most of users are using EXPLAIN to get the query execution plan.

apstndb commented 1 month ago

Thank you for response.

I think there are options to display column names and column types.

No column name
- Current behavior of empty result.
Only column name
- Current behavior of non-empty result.
Column name with simple type name like INT64, ARRAY<STRUCT>, ARRAY<PROTO>, ARRAY<ENUM>
Column name with verbose type name like INT64, ARRAY<STRUCT<A INT64, B STRUCT<C INT64>>>, ARRAY<google.spanner.v1.PlanNode.ChildLink>, ARRAY<google.spanner.v1.PlanNode.Kind>

I feel column name with verbose type can be too long, so it may be better to choose "column name with simple type name" even if --verbose.

There is no changes in no --verbose mode.
Column name with simple type name is printed in header when spanner-cli is --verbose mode even if empty result.
"Column name with verbose type name" is only displayed in DESCRIBE.

What do you think?

yfuruyama commented 1 month ago

There is no changes in no --verbose mode.

Column name with simple type name is printed in header when spanner-cli is --verbose mode even if empty result.

"Column name with verbose type name" is only displayed in DESCRIBE.

All sounds good. By the way, how does duckdb show the complicated column type like ARRAY<STRUCT<...>> in the result? Does it abbreviate at some level?

We also need to think about how to show the column type with vertical result mode (\G). If you have a chance, could you check how it works in duckdb?

apstndb commented 1 month ago

It seems that duckdb shows full type name in both of header and DESCRIBE, no matter how complex.

D SELECT str, str AS str_copy, str AS str_copy2 FROM (SELECT {foo: 'bar', list: [{key: 'hoge', value: 'piyo'}], arr:array_value({x: 123, y: 234, z: 345})} AS str) LIMIT 1;
┌──────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│         str          │                                              str_copy                                              │                                                     str_copy2                                                      │
│ struct(foo varchar…  │ struct(foo varchar, list struct("key" varchar, "value" varchar)[], arr struct(x integer, y integ…  │ struct(foo varchar, list struct("key" varchar, "value" varchar)[], arr struct(x integer, y integer, z integer)[1]) │
├──────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {'foo': bar, 'list…  │ {'foo': bar, 'list': [{'key': hoge, 'value': piyo}], 'arr': [{'x': 123, 'y': 234, 'z': 345}]}      │ {'foo': bar, 'list': [{'key': hoge, 'value': piyo}], 'arr': [{'x': 123, 'y': 234, 'z': 345}]}                      │
└──────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

D DESCRIBE SELECT str, str AS str_copy, str AS str_copy2 FROM (SELECT {foo: 'bar', list: [{key: 'hoge', value: 'piyo'}], arr:array_value({x: 123, y: 234, z: 345})} AS str) LIMIT 1;
┌─────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │                                                    column_type                                                     │  null   │   key   │ default │  extra  │
│   varchar   │                                                      varchar                                                       │ varchar │ varchar │ varchar │ varchar │
├─────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ str         │ STRUCT(foo VARCHAR, list STRUCT("key" VARCHAR, "value" VARCHAR)[], arr STRUCT(x INTEGER, y INTEGER, z INTEGER)[1]) │ YES     │         │         │         │
│ str_copy    │ STRUCT(foo VARCHAR, list STRUCT("key" VARCHAR, "value" VARCHAR)[], arr STRUCT(x INTEGER, y INTEGER, z INTEGER)[1]) │ YES     │         │         │         │
│ str_copy2   │ STRUCT(foo VARCHAR, list STRUCT("key" VARCHAR, "value" VARCHAR)[], arr STRUCT(x INTEGER, y INTEGER, z INTEGER)[1]) │ YES     │         │         │         │
└─────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┘

and no type name in other output mode(default is duckbox). https://duckdb.org/docs/api/cli/output_formats.html

D .mode line
D SELECT str, str AS str_copy, str AS str_copy2 FROM (SELECT {foo: 'bar', list: [{key: 'hoge', value: 'piyo'}], arr:array_value({x: 123, y: 234, z: 345})} AS str) UNION ALL SELECT NULL, NULL, NULL;
      str = {'foo': bar, 'list': [{'key': hoge, 'value': piyo}], 'arr': [{'x': 123, 'y': 234, 'z': 345}]}
 str_copy = {'foo': bar, 'list': [{'key': hoge, 'value': piyo}], 'arr': [{'x': 123, 'y': 234, 'z': 345}]}
str_copy2 = {'foo': bar, 'list': [{'key': hoge, 'value': piyo}], 'arr': [{'x': 123, 'y': 234, 'z': 345}]}

      str = 
 str_copy = 
str_copy2 =

One interesting thing is duckdb limits column length to fit $COLUMNS of terminal. It may also be useful feature in spanner-cli.

Real complex type example in SPANNER_SYS(I know we don't usually use this value directly.)

spanner> SELECT LATENCY_DISTRIBUTION FROM SPANNER_SYS.QUERY_STATS_TOP_MINUTE LIMIT 1;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LATENCY_DISTRIBUTION                                                                                                                                                   |
| ARRAY<STRUCT<COUNT INT64, MEAN FLOAT64, SUM_OF_SQUARED_DEVIATION FLOAT64, NUM_FINITE_BUCKETS INT64, GROWTH_FACTOR FLOAT64, SCALE FLOAT64, BUCKET_COUNTS ARRAY<INT64>>> |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

yfuruyama commented 1 month ago

Thank you for sharing various tests.

Regarding how we show the column type in the query result of --version option, I feel either is fine (showing full column type or reduced column type), but given that Spanner has PROTO type and it could have long name, I'm toward to showing the reduced column type.

One interesting thing is duckdb limits column length to fit $COLUMNS of terminal. It may also be useful feature in spanner-cli.

This might be another discussion, but I'm wondering how many users use non-default $COLUMNS variable. I checked now and I have been using default COLUMNS=80 variable inherited from the option of terminal app, which is too short compared to the window size of my laptop.

apstndb commented 1 month ago

This might be another discussion, but I'm wondering how many users use non-default $COLUMNS variable. I checked now and I have been using default COLUMNS=80 variable inherited from the option of terminal app, which is too short compared to the window size of my laptop.

It is strange because $COLUMNS shell variable is automatically adjuted to the width of the terminal in major environments like zsh and bash. Anyway, I want to substitute $COLUMNS to the current width of the terminal. (I think it is available in readline.GetScreenWidth())

apstndb commented 1 month ago

Regarding how we show the column type in the query result of --version option, I feel either is fine (showing full column type or reduced column type), but given that Spanner has PROTO type and it could have long name, I'm toward to showing the reduced column type.

My personal preference is to have more options for controlling the output format, but that's probably not going to be popular.

yfuruyama commented 1 month ago

Maybe we can consider adding more options once we find a popular use case.

I'll defer to you about which style to be used by default for column type: full column type, or reduced column type.

apstndb commented 1 month ago

Yeah, I want to choose "simple type name" as the default in headers, because:

Verbose type name can be longer than column value, it can be problem because there are complex types in many columns and terminal width is limited.
User can simply use DESCRIBE on the same query to see verbose type name of STRUCT, PROTO, ENUM.

yfuruyama commented 1 month ago

Sounds good!

cloudspannerecosystem / spanner-cli