Use spanner-cli specific request tags for queries like `SELECT 1`

takabow commented 2 years ago

spanner-cli itself may issue SQLs in addition to SQLs directly issued by users. For example, SELECT 1 issued for heartbeat purposes, etc.

Such queries appear in Cloud Spanner's query statistics and may confuse users who are unaware that spanner-cli is issuing them. To identify such queries more easily, I suggest adding spanner-cli specific request tags to the queries.

As far as I can find, spanner-cli seems to issue its own SELECT query for the following uses.

SELECT 1 - At the beginning of a Read-write transaction
SELECT 1 - To check if DB exists.
SELECT 1 - Heartbeat
SELECT ... FROM INFORMATION_SCHEMA.TABLES ... - SHOW TABLES
SELECT ... FROM INFORMATION_SCHEMA.INDEXES ... - SHOW INDEX
SELECT ... FROM INFORMATION_SCHEMA.COLUMNS ... - SHOW COLUMNS

I am not sure what specific string of text to tag, so I need your opinion.

As an example, it seems that Cloud Console uses sys_cloud_console_* to issue some queries.

spanner> SELECT SUBSTR(text,0,20) AS text, request_tag,
      -> FROM spanner_sys.query_stats_top_minute
      -> WHERE request_tag <> "";
+----------------------+---------------------------------------------+
| text                 | request_tag                                 |
+----------------------+---------------------------------------------+
| SELECT TABLE_NAME, T | sys_cloud_console_DatabaseTables            |
| SELECT VIEW_DEFINITI | sys_cloud_console_TableDetailViewDefinition |
| SELECT OPTION_NAME,  | sys_cloud_console_DatabaseOptions           |
| SELECT INDEX_NAME, I | sys_cloud_console_TableDetailIndexes        |
| SELECT TABLE_NAME, C | sys_cloud_console_TableDetailTableColumnMap |
| SELECT TABLE_NAME, C | sys_cloud_console_DatabaseTableColumnMap    |
| SELECT TABLE_NAME, C | sys_cloud_console_TableDetailColumns        |
| SELECT TABLE_NAME, T | sys_cloud_console_TableDetail               |
| SELECT TABLE_NAME, T | sys_cloud_console_DatabaseTables            |
| SELECT OPTION_NAME,  | sys_cloud_console_DatabaseOptions           |
| SELECT TABLE_NAME, C | sys_cloud_console_DatabaseTableColumnMap    |
+----------------------+---------------------------------------------+

takabow commented 2 years ago

One idea is to use spanner-cli_*, such as spanner-cli_heartbeat

takabow commented 2 years ago

like the following.

+----------------------------+---------------------------------------------+
| request_tag idea           | When is the query issued?                   |
+----------------------------+---------------------------------------------+
| spanner-cli_BeginRW        | BEGIN RW - `SELECT 1`                       |
| spanner-cli_DatabaseExists | DatabaseExists - `SELECT 1`                 |
| spanner-cli_Heartbeat      | Heartbeat - `SELECT 1`                      |
| spanner-cli_ShowTables     | SHOW TABLES - `INFORMATION_SCHEMA.TABLES`   |
| spanner-cli_ShowIndex      | SHOW INDEX - `INFORMATION_SCHEMA.INDEXES`   |
| spanner-cli_ShowColmuns    | SHOW COLUMNS - `INFORMATION_SCHEMA.COLUMNS` |
+----------------------------+---------------------------------------------+

yfuruyama commented 2 years ago

@takabow Looks good! Thanks for raising this.

One thing I want to discuss is whether we should use different tags for different use cases.

If we use the same tag like spanner-cli for all use cases, it's easier for us to maintain the code and also it would be easier/intuitive for users to exclude the spaner-cli requests: WHERE request_tag != "spanner-cli".

Do we think it's better to use different tags for each use case?

apstndb commented 2 years ago

The request tag is effectively used as the key of the query stats entry, not part of the compound key.

TEXT Statistics for multiple queries that have the same tag string are grouped in a single row with the REQUEST_TAG matching that tag string. Only the text of one of those queries is shown in this field, truncated to approximately 64KB.

TEXT_FINGERPRINT The hash of the REQUEST_TAG value if present; Otherwise, the hash of the TEXT value.

I think this behavior is confusing and it is better to use different request_tag for each query to ease troubleshooting.

it would be easier/intuitive for users to exclude the spaner-cli requests: WHERE request_tag != "spanner-cli".

I think it is not hard to use WHERE request_tag NOT LIKE "spanner-cli_%".

yfuruyama commented 2 years ago

Ah I didn't know that the multiple requests with the same tag are grouped into the single row of query statistics regardless of the actual queries. Then using the same tag doesn't make sense from semantics perspective.

takabow commented 1 year ago

As apstndb suggested, I would like to implement this with individual tags.

yfuruyama commented 1 year ago

Sounds good to me!

cloudspannerecosystem / spanner-cli

Use spanner-cli specific request tags for queries like `SELECT 1` #129