Open malikas05 opened 1 year ago
Hi @malikas05, thanks for the report with all the details.
Why does the new driver use InputBasedPreparedStatement and not SqlBasedPreparedStatement anymore?
The driver was refactored in 0.3.2 to resolve stability issue and the data format was changed from TabSeparated to RowBinary for better performance. InputBasedPreparedStatement
is good in most cases because it can validate inputs and send binary data to ClickHouse, while SqlBasedPreparedStatement
is nothing but a SQL query with all parameters substituted, which is less efficient(not only on client side but also server).
If you prefer SqlBasedPreparedStatement, before a new option(e.g. auto_type_inferring=false) in place, you may add custom expression like insert into mytable values(ifnull(?, ''), ?)
so that the driver will give up type inferring.
Is there a better way to handle this?
Have you tried to pass LocalDateTime object instead to the driver? Alternatively, if you prefer a more direct way, you may try ClickHouseWriter instead.
Hi @zhicwu. Thanks so much for your detailed reply.
Have you tried to pass LocalDateTime object instead to the driver?
I don't think it would make a difference without using parameters such as use_server_time_zone=false&use_time_zone=UTC
. Otherwise, could you please share an example of how to set it in the PreparedStatement? Thank you!
You're right, using LocalDateTime makes no different :p However, Instant or Timestamp should work - please refer to below.
// server time zone is set to America/Toronto
Properties props = new Properties();
// props.setProperty("use_server_time_zone", "false");
// props.setProperty("use_time_zone", "Etc/UTC");
try (ClickHouseConnection conn = newConnection(props); Statement s = conn.createStatement()) {
s.execute("drop table if exists test_insert_datetime;"
+ "create table test_insert_datetime(id Int32, ts DateTime) engine=Memory");
final long epochMilli = 1678321201123L; // 2023-03-09 00:20:01.123
final java.sql.Timestamp ts = new Timestamp(epochMilli);
try (PreparedStatement stmt = conn.prepareStatement(
"insert into test_insert_datetime")) {
stmt.setString(1, "1");
stmt.setLong(2, ts.getTime() / 1000L); // 2023-03-09 00:20:01
stmt.addBatch();
stmt.setString(1, "2");
stmt.setString(2, "2023-03-09 00:20:01"); // 2023-03-09 00:20:01
stmt.addBatch();
stmt.setString(1, "3");
stmt.setObject(2, LocalDateTime.ofEpochSecond(1678321201L, 123000000, ZoneOffset.UTC)); // 2023-03-09 00:20:01
stmt.addBatch();
stmt.setString(1, "4");
stmt.setObject(2, Instant.ofEpochMilli(epochMilli)); // 2023-03-08 19:20:01
stmt.addBatch();
stmt.setString(1, "5");
stmt.setTimestamp(2, ts); // 2023-03-08 19:20:01
stmt.addBatch();
stmt.executeBatch();
ResultSet rs = s.executeQuery("select * from test_insert_datetime order by id");
Object[][] values = new Object[5][];
int index = 0;
while (rs.next()) {
values[index++] = new Object[] { rs.getInt(1), rs.getObject(2) };
}
}
}
Although we have workaround here, it's indeed inconsistent with clickhouse-client, the native command line:
# start with UTC
docker run --rm -it -e TZ=Etc/UTC clickhouse/clickhouse-server bash
./entrypoint.sh >/dev/null 2>&1 &
clickhouse-client
...
select timezone(), toDateTime(1678321201)
...
┌─timezone()─┬─toDateTime(1678321201)─┐
│ Etc/UTC │ 2023-03-09 00:20:01 │
└────────────┴────────────────────────┘
# now let's try your timezone
docker run --rm -it -e TZ=America/Toronto clickhouse/clickhouse-server bash
./entrypoint.sh >/dev/null 2>&1 &
...
select timezone(), toDateTime(1678321201)
...
# use server timezone
clickhouse-client
┌─timezone()──────┬─toDateTime(1678321201)─┐
│ America/Toronto │ 2023-03-08 19:20:01 │
└─────────────────┴────────────────────────┘
...
\q
# use client timezone
TZ=Etc/UTC clickhouse-client --use_client_time_zone 1
select timezone(), toDateTime(1678321201)
...
┌─timezone()──────┬─toDateTime(1678321201)─┐
│ America/Toronto │ 2023-03-09 00:20:01 │
└─────────────────┴────────────────────────┘
Perhaps we should add a new option to bypass step 2 and 5. ...
- It translates it to the DateTime value such as YYYY-MM-DD HH:MM:SS in UTC timezone. ...
- Once it is saved on the server side, it would be converted again to UTC, i.e., DateTime value + 5h.
Hi. We noticed a change in behavior when upgraded clickhouse driver and started using
com.clickhouse.jdbc.ClickHouseDriver
instead of the legacy driver fromru.yandex.*
. More specifically, the change is in the way the dates are written into tables. Here are the details:The new clickhouse-jdbc driver is using a different implementation of PreparedStatement which is InputBasedPreparedStatement whereas the previous version from the legacy driver
ru.yandex.*
was using SqlBasedPreparedStatement implementation. They have different implementation code in terms of how the datetime values are written into the database. The way we populate the datetime column in our code is as follows:ps.setLong(i, statistic.getDate().getTime() / 1000L);
So in other words, we take the epoch time, remove millisecond part and let it write to the database as is. So the final value would be stored in UTC and when we read it, it would be translated to whatever timezone is set in the server configs or connection string params(we rely on the former). In the new version of the driver, InputBasedPreparedStatement is using a different approach:As a result of that, we see all stats shifted by 5 hours ahead. We found that in order to remediate this issue, we have to set the following connection string params: jdbc:clickhouse://127.0.0.1:18123/ssdsp?use_server_time_zone=false&use_time_zone=UTC
Now the questions. Why does the new driver use InputBasedPreparedStatement and not SqlBasedPreparedStatement anymore? Is there a better way to handle this? Unfortunately, I wasn't able to find any comments in the release notes relating to that change. Please advise.