Open c0mm4nd opened 5 months ago
This is because the client's ResultSet text and JSON decoders assume a UTF-8 string inside that field (but it is not a UTF-8 string in this specific case).
For now, a passable workaround could be to just convert it back to bytes:
import { createClient } from '@clickhouse/client' // or '@clickhouse/client-web'
void (async () => {
const client = createClient()
const resultSet = await client.query({
query: `SELECT unhex('000000708550f340a1297eefe721a3b0631d8dc4cc5a3462abaeef1a79726f6b') AS x`,
format: 'JSONEachRow',
})
const result = await resultSet.json<{ x: string }>()
console.info('Result:', new TextEncoder().encode(result[0].x))
await client.close()
})()
Result: Uint8Array(58) [
0, 0, 0, 112, 239, 191, 189, 80, 239, 191,
189, 64, 239, 191, 189, 41, 126, 239, 191, 189,
239, 191, 189, 33, 239, 191, 189, 239, 191, 189,
99, 29, 239, 191, 189, 239, 191, 189, 239, 191,
189, 90, 52, 98, 239, 191, 189, 239, 191, 189,
239, 191, 189, 26, 121, 114, 111, 107
]
Another solution (which looks like a more correct one in this case, cause the previous result set looks a bit off) is just to ask ClickHouse to return a byte array instead so that conversion happens on the server side:
WITH '000000708550f340a1297eefe721a3b0631d8dc4cc5a3462abaeef1a79726f6b' AS str
SELECT arrayMap(x -> reinterpretAsUInt8(x), splitByString('', unhex(str))) AS bytes
Result:
┌─bytes─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ [0,0,0,112,133,80,243,64,161,41,126,239,231,33,163,176,99,29,141,196,204,90,52,98,171,174,239,26,121,114,111,107] │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
And that will be decoded with JSON formats as just an array of numbers, which can be then used to create a UInt8Array in Node.
@c0mm4nd, can you please check if it works in your scenario? Then I can add an example and close this.
In the first passable solution, the hash value length is 58 rather than 32, due to the invalid unicode 65533 has already ruined the result. I cannot convert the 65533 back to the correct byte. The second works, but it requires rewriting the SQL, and the bytes-parse work is not efficient when handling tons of hash results. So these two are not what I want.
Python client clickhouse-connect also works on HTTP port, but it doesn't have such a problem.
Another option is to return the fixed string encoded as base64, which will require rewriting the queries a bit.
For example:
import { createClient } from '@clickhouse/client'
void (async () => {
const client = createClient()
const resultSet = await client.query({
query: `SELECT base64Encode(unhex('000000708550f340a1297eefe721a3b0631d8dc4cc5a3462abaeef1a79726f6b')) AS x`,
format: 'JSONEachRow',
})
const result = await resultSet.json<{ x: string }>()
const buffer = Buffer.from(result[0].x, 'base64')
console.info('Base64:', result[0].x)
console.info('Buffer:', buffer, 'with length:', buffer.length)
await client.close()
})()
Base64: AAAAcIVQ80ChKX7v5yGjsGMdjcTMWjRiq67vGnlyb2s=
Buffer: <Buffer 00 00 00 70 85 50 f3 40 a1 29 7e ef e7 21 a3 b0 63 1d 8d c4 cc 5a 34 62 ab ae ef 1a 79 72 6f 6b> with length: 32
Python client clickhouse-connect also works on HTTP port, but it doesn't have such a problem.
I'd assume it works there because it uses RowBinary (so that custom type and column mappers can be provided), which has not yet been implemented in this client. See this issue for tracking; I had to put it on hold due to other duties, as this feature is massive.
Describe the bug
Directly select a String or FixedString(N) column which stores raw bytes rather than UTF-8 string, the result will contain
e.g.
the correct value of this should be the hash
unhex('000000708550f340a1297eefe721a3b0631d8dc4cc5a3462abaeef1a79726f6b')
Steps to reproduce
65533
Expected behaviour
return a correct string or bytes
Code example
as shown in Steps to reproduce
Error log
Configuration
Environment
ClickHouse server
CREATE TABLE
statements for tables involved: