ClickHouse / clickhouse-connect

Python driver/sqlalchemy/superset connectors
Apache License 2.0
332 stars 65 forks source link

Invalid argument error when reading datetime data #383

Closed IlluvatarEru closed 3 months ago

IlluvatarEru commented 3 months ago

Describe the bug I am reading data from a table, specifically getting the distinct timestamp. The type of that column is DateTime64(6).

I am receiving the below error:

ERROR:root:query failed: select distinct timestamp from db.table where  timestamp > '2024-07-29' 
 with exception:
[Errno 22] Invalid argument
Traceback (most recent call last):
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\driver\client.py", line 193, in query
    return self._query_with_context(query_context)
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\driver\httpclient.py", line 222, in _query_with_context
    query_result = self._transform.parse_response(byte_source, context)
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\driver\transform.py", line 68, in parse_response
    first_block = get_block()
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\driver\transform.py", line 50, in get_block
    column = col_type.read_column(source, num_rows, context)
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\datatypes\base.py", line 143, in read_column
    return self.read_column_data(source, num_rows, ctx)
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\datatypes\base.py", line 158, in read_column_data
    column = self._read_column_binary(source, num_rows, ctx)
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\datatypes\temporal.py", line 179, in _read_column_binary
    return self._read_binary_tz(column, active_tz)
  File "C:\Users\user\anaconda3\lib\site-packages\clickhouse_connect\datatypes\temporal.py", line 189, in _read_binary_tz
    dt_sec = dt_from(seconds, tz_info)
OSError: [Errno 22] Invalid argument

Note that running the exact same query from the CH client directly works fine. Note: the min timestamp in that table is: 1970-01-01 01:00:00.000000 and the max timestamp: 2299-12-31 23:43:20.000000. I have found out that deleting the rows where timestamp = max_timestamp fixed the issue.

To Reproduce I am really just creating a client and running the select query: select distinct timestamp from db.table where timestamp > '2024-07-29'

Expected behavior I am expecting the query to get the timestamps without any error.

Versions

genzgd commented 3 months ago

What timezone is your ClickHouse server using? And what is your local client timezone?

genzgd commented 3 months ago

I ask because you can only get to that code if there are timezones involved. I cannot reproduce the problem locally. Also there has been work in clickhouse-connect around timezones since the 0.7.0 release, so it might help if you upgrade to the latest version.

IlluvatarEru commented 3 months ago

What timezone is your ClickHouse server using? And what is your local client timezone?

Getting the below from the CH cli (connecting without any tz adjustments):

SELECT
    now() AS server_timestamp,
    timezone() AS server_timezone

returns 2024-08-02 07:25:59 │ Europe/Berlin

and am located in BST time. (NB: getting the same when running using the python clickhouse-connect)

genzgd commented 3 months ago

Are you running on Windows? It seems possible you're running into this bug? https://github.com/python/cpython/issues/94414#issuecomment-1478714464

IlluvatarEru commented 3 months ago

Are you running on Windows? It seems possible you're running into this bug? python/cpython#94414 (comment)

Well spotted, that must be it, was trying to run those from Windows indeed, that would also explain why it does not break in production on linux. Closing the issue.