cloudspannerecosystem / spanner-cli

Interactive command line tool for Cloud Spanner
Apache License 2.0
227 stars 29 forks source link

Support Protocol Buffers #174

Open apstndb opened 1 month ago

apstndb commented 1 month ago

Protocol Buffers support is already GA. We can discuss initial support of Protocol Buffers in spanner-cli.

yfuruyama commented 1 month ago

CREATE PROTO BUNDLE statement is tricky because it requires a companion file (proto file) to be sent to the Spanner API altogether.

Then, there might be two possible ways:

  1. For interactive mode: Extend CREATE PROTO BUNDLE statement in some ways. For example with a comment line, like CREATE PROTO BUNDLE ... ; # order_descriptors.pb
  2. For batch (command) mode: Create an option similar to --proto-descriptors-file of gcloud.

I think the first option is not intuitive. For second option, I don't think users will use that option if the same functionality is available in gcloud.

Overall, my impression is that we can ask users to use gcloud if they want to run CREATE PROTO BUNDLE statement.

apstndb commented 1 month ago

I agree CREATE PROTO BUNDLE support is not intuitive and not useful in current proposal.

What about PROTO and ENUM types?

Current behaviour of PROTO and ENUM typed values.

proto

Use https://github.com/googleapis/google-cloud-go/blob/spanner/v1.62.0/spanner/testdata/protos/singer.proto.

Query

WITH t AS (SELECT CAST('genre: POP, nationality: "Japan"' AS spanner.examples.music.SingerInfo) AS singer)
SELECT t.singer, t.singer.genre, CAST(t.singer AS STRING) AS singer_string, CAST(t.singer.genre AS STRING) AS genre_string
FROM t;
spanner> WITH t AS (SELECT CAST('genre: POP, nationality: "Japan"' AS spanner.examples.music.SingerInfo) AS singer)
      -> SELECT t.singer, t.singer.genre, CAST(t.singer AS STRING) AS singer_string, CAST(t.singer.genre AS STRING) AS genre_string
      -> FROM t;
+-----------------------------+------------------+---------------------------------+--------------+
| singer                      | genre            | singer_string                   | genre_string |
+-----------------------------+------------------+---------------------------------+--------------+
| string_value:"GgVKYXBhbiAA" | string_value:"0" | nationality: "Japan" genre: POP | POP          |
+-----------------------------+------------------+---------------------------------+--------------+
1 rows in set (1.07 msecs)

I think the raw string_value: ... is far from ideal output.

Proposed behavior

There are no message descriptor in ResultSet, so I think it is impossible to print as _string. Most usable information is ProtoTypeFqn. spanner-cli can print values as typed values.

spanner> WITH t AS (SELECT CAST('genre: POP, nationality: "Japan"' AS spanner.examples.music.SingerInfo) AS singer)
      -> SELECT t.singer, t.singer.genre, CAST(t.singer AS STRING) AS singer_string, CAST(t.singer.genre AS STRING) AS genre_string
      -> FROM t;
+-----------------------------------------------------+-----------------------------------+---------------------------------+--------------+
| singer                                              | genre                             | singer_string                   | genre_string |
+-----------------------------------------------------+-----------------------------------+---------------------------------+--------------+
| "GgVKYXBhbiAA" AS spanner.examples.music.SingerInfo | 0 AS spanner.examples.music.Genre | nationality: "Japan" genre: POP | POP          |
+-----------------------------------------------------+-----------------------------------+---------------------------------+--------------+
1 rows in set (1.38 msecs)
apstndb commented 1 month ago

Is this output a bit long?

yfuruyama commented 1 month ago

What about PROTO and ENUM types?

Yes, I agree that we should decode PROTO and ENUM typed values properly, but showing the proto field name looks a bit too much.

If users want to check an actual value, they might use CAST(<column> AS STRING) as your example shows, so I think just showing the raw value (base64-encoded string for PROTO and string for ENUM) might be adequate in spanner-cli. Spanner Studio (Cloud Console) also shows both types in that style.

apstndb commented 1 month ago

I agree FQN type name is too long and the same printing as Spanner Studio is a good choice. Another option is printing only the last component of the FQN: like "GgVKYXBhbiAA" AS SingerInfo, 0 AS Genre.

yfuruyama commented 1 month ago

Another option is printing only the last component of the FQN: like "GgVKYXBhbiAA" AS SingerInfo, 0 AS Genre.

This is a very edge case, but it could be confusing if a STRING value contains "AS" in the value.

spanner> SELECT '"GgVKYXBhbiAA" AS SingerInfo';
+------------------------------+
|                              |
+------------------------------+
| "GgVKYXBhbiAA" AS SingerInfo |
+------------------------------+
1 rows in set (2.13 msecs)

So I'm not sure if it's a good idea to annotate the value with a proto type.

apstndb commented 1 month ago

I often read databases whose schemas I didn't know, so I was worried about outputting raw values ​​that couldn't be read without knowledge of the schemas. spanner-cli does not display type information for all values, so let's align it with Spanner Studio for consistency.