databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.85k stars 750 forks source link

Feature: Add Column Count to system.tables #16449

Closed inviscid closed 2 months ago

inviscid commented 2 months ago

Summary

We frequently obtain table shape information from the system tables. Since the column count is not part of system.tables information, we must perform a sub-query count on system.columns.

This information is returned trivially fast in a Postgres/Greenplum environment but often takes 1.5 to 2 seconds to return in Databend (per table). By storing the column count in system.tables the sub-query can be eliminated and hopefully the table shape information can be returned sub-second.

This is an example of the current query used to retrieve the table shape:

SELECT t.database AS schema_name,
    t.name AS table_name,
    t.data_compressed_size as table_size,
    t.data_size as raw_table_size,
    t.index_size,
    t.num_rows
    ,(SELECT COUNT(*)
    FROM system.columns c
    WHERE c.database = 'default'
    AND c.table = 't1') AS num_cols
FROM system.tables t
where t.database = 'default'
and t.name = 't1';
dosubot[bot] commented 2 months ago

I found a similar open issue that might be relevant to your request: Tracking: Improvement for large datasets. This issue is still open and was last updated on April 15, 2024. It does not have any comments or a solution yet [1].

To address your feature request of adding a column count to the system.tables table, you can follow the process for adding a new column to a system table in Databend. Here is a summarized version of the steps involved:

  1. Create an Interpreter: Instantiate an AddTableColumnInterpreter with the context and plan.
  2. Execute the Interpreter: Implement the execute2 method to handle the addition of the column.
  3. Generate a New Snapshot: If the table is a FuseTable, generate a new snapshot with the updated schema.
  4. Analyze the Column: Before adding the column, analyze its definition to ensure it is valid.

For detailed implementation, you can refer to the code snippets provided in the source files:

This process ensures that the new column is added correctly and the table's metadata is updated accordingly.

To continue talking to Dosu, mention @dosu.