ClickHouse / clickhouse-js

Official JS client for ClickHouse DB
https://clickhouse.com
Apache License 2.0
213 stars 26 forks source link

More specific types for `meta.type` #273

Open tylersayshi opened 4 months ago

tylersayshi commented 4 months ago

Use case

Use case will be for handling formatting of the data value from a response, when meta is present on an app.

Describe the solution you'd like

https://github.com/ClickHouse/clickhouse-js/blob/main/packages/client-common/src/clickhouse_types.ts#L8

This line ^ could be something more like the following

/** @see https://clickhouse.com/docs/en/sql-reference/data-types */
const DATA_TYPES = [
    'UInt8', 'UInt16', 'UInt32', 'UInt64', 'UInt128', 'UInt256', 'Int8', 'Int16', 'Int32', 'Int64', 'Int128', 'Int256',
    'Float32', 'Float64',
    'Boolean', 
    'String', 'FixedString',
    'Date', 'Date32',
    'DateTime', 'DateTime64',
    'JSON',
    'UUID',
    'Enum', 'LowCardinality',
    'Array',
    'Map',
    'SimpleAggregateFunction', 'AggregateFunction',
    'Nested',
    'Tuple',
    'Nullable',
    'IPv4', 'IPv6',
    'Point', 'Ring', 'Polygon', 'MultiPolygon', 
    'Expression', 'Set', 'Nothing', 'Interval'
] as const;
type DataType = typeof DATA_TYPES[number];

// ...
    meta?: Array<{
        name: string;
        type: DataType;
    }>;

Describe the alternatives you've considered

This type could be an enum or string literal. I'd have no real preference.

Additional context

What would be nice for me as a consumer of @clickhouse/client is to be able to treat the library as a comprehensive source of truth for what data types I should expect to be able to format. This way, I can format them accordingly in my apps (especially if new data types are added later on or renamed, so I'd see the type change).

slvrtrn commented 4 months ago

Array, Tuple, Map, DateTime, etc, are a bit tricky in this case cause:

➜  ~ curl  --data-binary \
     "SELECT tuple(32, 'foo') AS x FORMAT JSON;" \
     http://localhost:8123
{
    "meta":
    [
        {
            "name": "x",
            "type": "Tuple(UInt8, String)"
        }
    ],

    "data":
    [
        {
            "x": [32,"foo"]
        }
    ],

    "rows": 1,

    "statistics":
    {
        "elapsed": 0.000961292,
        "rows_read": 1,
        "bytes_read": 1
    }
}

So, the proposed literal won't match the actual type value in certain cases.

Another approach is to provide a helper of sorts, possibly utilizing the code from the RowBinary branch (the branch is currently on hold, but the column types parser was ready, just missing some tests). It's the exact same scenario here — we need to parse the appropriate types (preferably with nested types for correctness).

export function parseMetaColumnTypes(
  meta: Array<{ name: string, type: string> }>
): Array<{ name: string; type: ParsedColumnType }>

CC @mshustov - I'm curious to hear your opinion as well.

mshustov commented 3 months ago

Another approach is to provide a helper of sorts, possibly utilizing the code from the RowBinary branch

@slvrtrn I'm not sure the RowBinary parser would work with the type definition. @tylerlaws0n would @slvrtrn's proposal solve your problem?

slvrtrn commented 3 months ago

I'm not sure the RowBinary parser would work with the type definition.

Just to clarify: the idea was that the meta.type TS hint would stay as is, i.e., just string, cause we can't type every variant, such as Tuple(String, String) etc.

However, we can provide a helper function (essentially implemented already in the RowBinary branch, minus a few leftover types) that will produce some (probably useful) data structures out of the meta type in runtime.

For example:

console.log('### Simple types')
console.dir(parseColumnType('UInt32'))

console.log('### Enums')
console.dir(parseColumnType("Enum8('a' = 1, 'b' = 2)"), {
  depth: 1000,
})

console.log('### Nested types')
console.dir(parseColumnType('Map(String, Array(Tuple(UInt8, String)))'), {
  depth: 1000,
})
### Simple types
{ type: 'Simple', columnType: 'UInt32', sourceType: 'UInt32' }
### Enums
{
  type: 'Enum',
  values: Map(2) { 1 => 'a', 2 => 'b' },
  intSize: 8,
  sourceType: "Enum8('a' = 1, 'b' = 2)"
}
### Nested types
{
  type: 'Map',
  key: { type: 'Simple', columnType: 'String', sourceType: 'String' },
  value: {
    type: 'Array',
    value: {
      type: 'Tuple',
      elements: [
        { type: 'Simple', columnType: 'UInt8', sourceType: 'UInt8' },
        { type: 'Simple', columnType: 'String', sourceType: 'String' }
      ],
      sourceType: 'Tuple(UInt8, String)'
    },
    dimensions: 1,
    sourceType: 'Array(Tuple(UInt8, String))'
  },
  sourceType: 'Map(String, Array(Tuple(UInt8, String)))'
}
tylersayshi commented 3 months ago

... @tylerlaws0n would @slvrtrn's proposal solve your problem?

Yes! This would work and be very helpful.

Just to clarify: the idea was that the meta.type TS hint would stay as is, i.e., just string, cause we can't type every variant, such as Tuple(String, String) etc.

These could actually be typed out via template string literals

In the case of Tuple it could be something like this:

type TupleMember = 'String' | 'Number'; // This is just for example and would clearly need many more members
type Tuple = `Tuple(${TupleMember}, ${TupleMember})`;

I think the helper proposed by @slvrtrn is actually more helpful and easier to implement/maintain than templating out the strings for every possibility. However, I think it is worth sharing this pattern still in case there is ever value to be had from generating the strings of all possible meta types.

Edit to share a screenshot of what this evaluates out to:

image
tylersayshi commented 3 months ago

If a middle ground is desirable; you could type out just the outer part of the type and use string inside:

type Tuple = `Tuple(${string}, ${string})`;

const aTuple: Tuple = 'Tuple(Foo, Bar)';

I think this could be really useful for checking the top level type, then disregarding the problem with depth for this field.

I think in most cases, just being able to check if meta.type === 'String' or meta.type.includes('Tuple(') is sufficient for the formatting I can think of with showing those values on a UI.