Closed JorisTruong closed 1 year ago
I see, you want custom patterns to also specify a custom timezone. Really, if anything, it should use the default Spark timezone, for consistency with the rest of Spark. You can't control the data source to specify TZ? that would be much better. I'll comment on your PR
Description
Currently encountering some issue when parsing with a specific timestamp format. I have XML data with timestamp of this format:
yyyy/MM/dd HH:mm:ss
, and I am trying to read it using thetimestampFormat
option.Looking at the
parseXmlTimestamp()
function in TypeCast.scala, it seems thatspark-xml
will be usingTimestamp.from(ZonedDateTime.parse(value, format).toInstant)
to parse the timestamp. However, with a format likeyyyy/MM/dd HH:mm:ss
, there is no timezone, which causes the function to fail: It will then return null values for the whole timestamp column.To reproduce
time.xml
file:PySpark code to read:
Suggestion
We may want to add a default timezone in the
parseXmlTimestamp()
function, like this:Any thoughts about adding a
timeZone
option?The following will make the
parseXmlTimestamp()
function returns this: