ClickHouse / clickhouse-js

Official JS client for ClickHouse DB
https://clickhouse.com
Apache License 2.0
225 stars 27 forks source link

Recovering from ECONNRESET on client #143

Closed vincenzon closed 1 year ago

vincenzon commented 1 year ago

We use a long running client to push data to our CH server. That process looks like:

const send = async (rows: Array<Row>) => {
   const stream = new Readable({ objectMode: true, read: () => {}});
   const insert = this.client!.insert({
      table: this.table,
      values: stream,
      format: "JSONEachRow",
   });

   stream.push(rows);
   stream.push(null);
   await insert;
}

Occasionally when invoking this send function the client throws a ECONNRESET error. The error is caught and we are able to reconnect and carry on.

My question is: is there a reliable way to know how many (if any) of the rows array has been sent to the CH server when the connection reset occurs? I'd like to be able to resend any of the rows that have not otherwise been sent.

slvrtrn commented 1 year ago

@vincenzon, considering your example:

const send = async (rows: Array<Row>) => {
   const stream = new Readable({ objectMode: true, read: () => {}});
   const insert = this.client!.insert({
      table: this.table,
      values: stream,
      format: "JSONEachRow",
   });

   stream.push(rows);
   stream.push(null);
   await insert;
}

As the rows type is already an Array, this could've been simplified to just

const send = async (rows: Array<Row>) => {
  return this.client!.insert({
      table: this.table,
      values: rows,
      format: "JSONEachRow",
   });
}

There is no need to use a stream here explicitly.

Regarding your question: IIRC, there is no reliable way to track how many rows were inserted in the database before we finalize the input stream cause we don't get this information from the server.

Smaller batches and the ClickHouse insert deduplication feature could be an option here.

Do you think it could work in your scenario?

vincenzon commented 1 year ago

Thanks for the reminder about the stream, you are right, I no longer am using streaming and can just use an array. Thanks also for the tip about the deduplication feature that sounds like what I am looking for.