Open NVnavkumar opened 10 months ago
Another aspect to consider: When you pass a Python datetime
object with timezone information, then it will be converted to UTC before sending it Spark. This produces an date value out of range
Python exception.
However, this also means that the effective range for testing dates for timestamps which have positive offset from UTC is actually restricted. You can only start datetime
values at 0001-01-01 00:00:00.000000
UTC time, so a time that is 0001-01-01 00:00:00.000000
in a local timezone ahead of UTC cannot actually be sent back to Spark from Python because it will be converted to UTC before sending to Spark (which will be a Year 0 timestamp in UTC) that is out of range. That value is still valid though for ANSI purposes (0001-01-01 00:00:00.000000
to 9999-12-31 23:59:59.999999
is the valid range) and Spark is okay with those values.
I think the best course of action here is to write tests in Scala to handle multiple non-UTC timezones and dates that are in the invalid (non-positive and >9999 years) instead of trying to do this in Python because it's difficult to change how Python handles things in the PythonRunner on the executor.
Is your feature request related to a problem? Please describe.
9996 allows us to test the full "valid" range of timestamps (
0001-01-01 00:00:00.000000
to9999-12-31 23:59:59.999999
) in Spark. However, Spark can even support several invalid timestamps as well (negative years and 6 digit years). We should allow this full range of inputs to Spark with CPU and GPU support.