ClickHouse / clickhouse-java

ClickHouse Java Clients & JDBC Driver
https://clickhouse.com
Apache License 2.0
1.45k stars 534 forks source link

New clickhouse driver is using a different impl of PreparedStatement #1270

Open malikas05 opened 1 year ago

malikas05 commented 1 year ago

Hi. We noticed a change in behavior when upgraded clickhouse driver and started using com.clickhouse.jdbc.ClickHouseDriver instead of the legacy driver from ru.yandex.*. More specifically, the change is in the way the dates are written into tables. Here are the details:

The new clickhouse-jdbc driver is using a different implementation of PreparedStatement which is InputBasedPreparedStatement whereas the previous version from the legacy driver ru.yandex.* was using SqlBasedPreparedStatement implementation. They have different implementation code in terms of how the datetime values are written into the database. The way we populate the datetime column in our code is as follows: ps.setLong(i, statistic.getDate().getTime() / 1000L); So in other words, we take the epoch time, remove millisecond part and let it write to the database as is. So the final value would be stored in UTC and when we read it, it would be translated to whatever timezone is set in the server configs or connection string params(we rely on the former). In the new version of the driver, InputBasedPreparedStatement is using a different approach:

  1. It takes an epoch time which we provide.
  2. It translates it to the DateTime value such as YYYY-MM-DD HH:MM:SS in UTC timezone.
  3. When it is saved to DB, it is treated in the timezone which is set in the server OR the timezone set via connection string params.
  4. Since we have America/Toronto set in the server, the DateTime value which was converted to UTC is treated in the America/Toronto timezone.
  5. Once it is saved on the server side, it would be converted again to UTC, i.e., DateTime value + 5h.

As a result of that, we see all stats shifted by 5 hours ahead. We found that in order to remediate this issue, we have to set the following connection string params: jdbc:clickhouse://127.0.0.1:18123/ssdsp?use_server_time_zone=false&use_time_zone=UTC

Now the questions. Why does the new driver use InputBasedPreparedStatement and not SqlBasedPreparedStatement anymore? Is there a better way to handle this? Unfortunately, I wasn't able to find any comments in the release notes relating to that change. Please advise.

zhicwu commented 1 year ago

Hi @malikas05, thanks for the report with all the details.

Why does the new driver use InputBasedPreparedStatement and not SqlBasedPreparedStatement anymore?

The driver was refactored in 0.3.2 to resolve stability issue and the data format was changed from TabSeparated to RowBinary for better performance. InputBasedPreparedStatement is good in most cases because it can validate inputs and send binary data to ClickHouse, while SqlBasedPreparedStatement is nothing but a SQL query with all parameters substituted, which is less efficient(not only on client side but also server).

If you prefer SqlBasedPreparedStatement, before a new option(e.g. auto_type_inferring=false) in place, you may add custom expression like insert into mytable values(ifnull(?, ''), ?) so that the driver will give up type inferring.

Is there a better way to handle this?

Have you tried to pass LocalDateTime object instead to the driver? Alternatively, if you prefer a more direct way, you may try ClickHouseWriter instead.

malikas05 commented 1 year ago

Hi @zhicwu. Thanks so much for your detailed reply.

Have you tried to pass LocalDateTime object instead to the driver?

I don't think it would make a difference without using parameters such as use_server_time_zone=false&use_time_zone=UTC. Otherwise, could you please share an example of how to set it in the PreparedStatement? Thank you!

zhicwu commented 1 year ago

You're right, using LocalDateTime makes no different :p However, Instant or Timestamp should work - please refer to below.

// server time zone is set to America/Toronto
Properties props = new Properties();
// props.setProperty("use_server_time_zone", "false");
// props.setProperty("use_time_zone", "Etc/UTC");
try (ClickHouseConnection conn = newConnection(props); Statement s = conn.createStatement()) {
    s.execute("drop table if exists test_insert_datetime;"
            + "create table test_insert_datetime(id Int32, ts DateTime) engine=Memory");
    final long epochMilli = 1678321201123L; // 2023-03-09 00:20:01.123
    final java.sql.Timestamp ts = new Timestamp(epochMilli);
    try (PreparedStatement stmt = conn.prepareStatement(
            "insert into test_insert_datetime")) {
        stmt.setString(1, "1");
        stmt.setLong(2, ts.getTime() / 1000L); // 2023-03-09 00:20:01
        stmt.addBatch();
        stmt.setString(1, "2");
        stmt.setString(2, "2023-03-09 00:20:01"); // 2023-03-09 00:20:01
        stmt.addBatch();
        stmt.setString(1, "3");
        stmt.setObject(2, LocalDateTime.ofEpochSecond(1678321201L, 123000000, ZoneOffset.UTC)); // 2023-03-09 00:20:01
        stmt.addBatch();
        stmt.setString(1, "4");
        stmt.setObject(2, Instant.ofEpochMilli(epochMilli)); // 2023-03-08 19:20:01
        stmt.addBatch();
        stmt.setString(1, "5");
        stmt.setTimestamp(2, ts); // 2023-03-08 19:20:01
        stmt.addBatch();
        stmt.executeBatch();

        ResultSet rs = s.executeQuery("select * from test_insert_datetime order by id");
        Object[][] values = new Object[5][];
        int index = 0;
        while (rs.next()) {
            values[index++] = new Object[] { rs.getInt(1), rs.getObject(2) };
        }
    }
}

Although we have workaround here, it's indeed inconsistent with clickhouse-client, the native command line:

# start with UTC
docker run --rm -it -e TZ=Etc/UTC clickhouse/clickhouse-server bash
./entrypoint.sh >/dev/null 2>&1 &
clickhouse-client
...
select timezone(), toDateTime(1678321201)
...
┌─timezone()─┬─toDateTime(1678321201)─┐
│ Etc/UTC    │    2023-03-09 00:20:01 │
└────────────┴────────────────────────┘

# now let's try your timezone
docker run --rm -it -e TZ=America/Toronto clickhouse/clickhouse-server bash
./entrypoint.sh >/dev/null 2>&1 &
...
select timezone(), toDateTime(1678321201)
...
# use server timezone
clickhouse-client
┌─timezone()──────┬─toDateTime(1678321201)─┐
│ America/Toronto │    2023-03-08 19:20:01 │
└─────────────────┴────────────────────────┘
...
\q
# use client timezone
TZ=Etc/UTC clickhouse-client --use_client_time_zone 1
select timezone(), toDateTime(1678321201)
...
┌─timezone()──────┬─toDateTime(1678321201)─┐
│ America/Toronto │    2023-03-09 00:20:01 │
└─────────────────┴────────────────────────┘

Perhaps we should add a new option to bypass step 2 and 5. ...

  1. It translates it to the DateTime value such as YYYY-MM-DD HH:MM:SS in UTC timezone. ...
  2. Once it is saved on the server side, it would be converted again to UTC, i.e., DateTime value + 5h.