Open matthiasgomolka opened 2 months ago
I came across your issue while researching my own timestamp[s]
issue.
I suspect your issue stems from the same thing - parquet does not have a seconds
timestamp type.
https://github.com/apache/arrow/issues/41382#issuecomment-2078658637
I'm not sure. I mean, other parquet readers handle the identical file just fine.
Describe the bug, including details regarding any error messages, version, and platform.
I've stumbled upon a weird issue, where I don't get the underlying isse.
I read a parquet file which contains a timestamp column. This timestamp column contains the value
9999-12-23 23:59:59
. When I read this file usingpyarrow
(or withpandas
andpyarrow
engine an dtype_backend), the rows with9999-12-23 23:59:59
show the value1816-03-22 05:56:07.066277376
.I'm pretty certain that
9999-12-23 23:59:59
is the correct value, because this is much more plausible (and that's whatduckdb
andImpala
say as well).When I write the respective row to parquet using
duckdb
and read this file usingpyarrow
, I get the correct value of9999-12-23 23:59:59
.I've already checked if this is a problem with the parquet version, but both files are version
1.0
. What else might cause this?Unfortunately, I can't share the parquet file in question because it contains confidential data.
Component(s)
Python