ClickHouse / clickhouse-js

Official JS client for ClickHouse DB
https://clickhouse.com
Apache License 2.0
209 stars 25 forks source link

Sending SQL query params in the URI can hit URI limits and respond with a 414 #140

Closed alexmorleyfinch closed 1 year ago

alexmorleyfinch commented 1 year ago

Describe the bug

When sending a query over the HTTP interface (using this lib), if the query_params option is too big, we run into HTTP URI limits. 414 Request-URI Too Large

The reason our query_params is so large is because we're using a WHERE...IN clause. The obvious fix/workaround is to have smaller batches and thus make smaller queries, however this affects our throughput overhead.

I was wondering if there is any scope/ability to pass the query params in the HTTP body instead of the URL query string.

You could argue that this isn't necessary and the current workaround is sufficient, but I thought I'd ask.

Steps to reproduce

Write a query with very large query_params, like so:

const visitorIds = Array(5000).fill(undefined).map((_, i) => `visitor${i}`); // the point is to make this BIG

const resultSet = await this.client.query({
    query: `SELECT * FROM test WHERE visitorId IN splitByChar(',', {visitorIds:String}),
    query_params: {
      visitorIds: visitorIds.join(','),
    },
});

const dataset = await resultSet.json();

// etc

If you run something like the above, you will see a HTTP 414 response error because the HTTP request made by the client is too large

Expected behaviour

I'd expect to be able to send the query params over the BODY instead of being part of the URI, and I'd expect to be able to configure this via the client interface somehow. Like clickhouse_settings: {send_params_as_body: true}} or something similar.

I appreciate this change would also require ClickHouse to handle this also, which requires additional collaboration.

Code example

Proposed fix?

const resultSet = await this.client.query({
    query: `SELECT * FROM test WHERE visitorId IN splitByChar(',', {visitorIds:String}),
    query_params: {
      visitorIds: visitorIds.join(','),
    },
    clickhouse_settings: {
      some_option_to_send_params_as_body_instead: true,
    }
});

Error log

Error: <html>
<head><title>414 Request-URI Too Large</title></head>
<body>
<center><h1>414 Request-URI Too Large</h1></center>
</body>
</html>

    at parseError (/home/admin/82df98d7a6f1f8e536b97a6c49ef845a432b4869/node_modules/@clickhouse/client/dist/error/parse_error.js:35:16)
    at ClientRequest.onResponse (/home/admin/82df98d7a6f1f8e536b97a6c49ef845a432b4869/node_modules/@clickhouse/client/dist/connection/adapter/base_http_adapter.js:127:51)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Configuration

All default except:

{
    host: url + ':' + port,
    username: username,
    password: password,
    log: debug ? {LoggerClass: DefaultLogger} : undefined,

    // When using a debugger, it's helpful to extend these timeouts
    connect_timeout: __DEV__ ? 1 * 60 * 1000 : undefined, // 1 minutes for dev, or default
    request_timeout: __DEV__ ? 10 * 60 * 1000 : undefined, // 10 minutes for dev, or default
}

Environment

ClickHouse server

slvrtrn commented 1 year ago

Is it CH Cloud or on-premise deployment?

Could it be that it's an LB or proxy that has a request size limitation?

You could also try increasing http_max_uri_size (see the docs), cause it's 1 Megabyte by default:

The size of the URL is limited to 1 MiB by default, this can be changed with the http_max_uri_size setting

alexmorleyfinch commented 1 year ago

It is hosted on AWS and I think it could be a LB issue. I missed the http_max_uri_size option so I will attempt that!

Thank you very much for your response!