Closed vaibhawvipul closed 2 weeks ago
@vaibhawvipul You can run make format
to fix the linting issues that are causing some CI tests to fail
Update -
Working on this bug, there is 8 hours difference in comet vs spark cast. If anyone has any pointer, please let me know.
Update -
Working on this bug, there is 8 hours difference in comet vs spark cast. If anyone has any pointer, please let me know.
What happens if you explicitly set the timezone to UTC in the Spark configs?
We may want to fallback to Spark for other timezones for now.
Thanks @andygrove , setting timestamp as UTC worked!
Any idea about this error?
test("cast string to timestamp") {
withSQLConf(
(SQLConf.SESSION_LOCAL_TIMEZONE.key -> "UTC"),
(CometConf.COMET_CAST_STRING_TO_TIMESTAMP.key -> "true")) {
val values = Seq("2020-01-01T12:34:56.123456", "T2").toDF("a")
castTest(values, DataTypes.TimestampType)
}
}
seems like, test is failing in ANSI mode, debugging that.
@andygrove
I have enabled a "string to timestamp" cast test case that passes. We are supporting all the timestamps mentioned in the issue.
[
](url)
Input | Spark | Comet |
---|---|---|
2020 | 2020-01-01 00:00:00.0 | 2020-01-01 00:00:00.0 |
2020-01 | 2020-01-01 00:00:00.0 | 2020-01-01 00:00:00.0 |
2020-01-01 | 2020-01-01 00:00:00.0 | 2020-01-01 00:00:00.0 |
2020-01-01T12 | 2020-01-01 12:00:00.0 | 2020-01-01 12:00:00.0 |
2020-01-01T12:34 | 2020-01-01 12:34:00.0 | 2020-01-01 12:34:00.0 |
2020-01-01T12:34:56 | 2020-01-01 12:34:56.0 | 2020-01-01 12:34:56.0 |
2020-01-01T12:34:56.123456 | 2020-01-01 12:34:56.123456 | 2020-01-01 12:34:56.123456 |
T2 | 2024-04-25 02:00:00.0 | 2024-04-25 02:00:00.0 |
Just a note -
seems like, test is failing in ANSI mode, debugging that.
There is an improved version of the ANSI testing in https://github.com/apache/datafusion-comet/pull/351
Thanks @vaibhawvipul this is looking great. I will review today.
- Spark supports some additional inputs like - "213" or "85678". i.e 3 digit and 5 digit years too! This is not being supported right now by Comet in this PR.
I would be fine with handling this in a follow on PR
This is looking great @vaibhawvipul. I think this is close to being ready to merge and then have some follow on issues for remaining items.
I think the one thing I would like to see before merging is that we can run the fuzz test without causing any panics, so just replacing unwraps with error handling.
This is looking great @vaibhawvipul. I think this is close to being ready to merge and then have some follow on issues for remaining items.
I think the one thing I would like to see before merging is that we can run the fuzz test without causing any panics, so just replacing unwraps with error handling.
@andygrove all unwraps are removed, I am handling errors in all of my functions.
As a follow up, I would also suggest adding/verifying support for any timezone. (See this test for instance: https://github.com/apache/datafusion-comet/blob/064cb472a7f55b6b67f44c22d6a8110320b330e7/spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala#L262)
Also see - https://docs.rs/arrow-cast/50.0.0/arrow_cast/parse/fn.string_to_datetime.html
I filed https://github.com/apache/datafusion-comet/issues/376 for the follow on work
Which issue does this PR close?
Closes #328 .
Rationale for this change
Improve compatibility with Apache Spark
What changes are included in this PR?
Add custom implementation of CAST from string to timestamp rather than delegate to DataFusion
How are these changes tested?