JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
715 stars 61 forks source link

Add timedelta, timedelta64 and datetime64 plus respective conversions #509

Open hhaensel opened 1 week ago

hhaensel commented 1 week ago

This PR replaces #334 and takes into account the major refactoring of PythonCall. Particularly, it fixes #293.

What's new?

Python Constructors

julia> pytimedelta(hour = 1, minute = 2)
Python: datetime.timedelta(seconds=3720)

julia> pytimedelta64(hour = 1, minute = 2)
Python: numpy.timedelta64(62,'m')

julia> pytimedelta64(year = 2, month = 3)
Python: numpy.timedelta64(27,'M')

julia> pydatetime64(year = 2024, month = 3)
Python: numpy.datetime64('2024-03-01T00:00:00')

Conversion to Julian types

julia> x = pytimedelta64(year = 11)
Python: numpy.timedelta64(11,'Y')

julia> pyconvert(Any, x) |> x -> (x, typeof(x))
(11 years, Dates.CompoundPeriod)

julia> pyconvert(Period, x) |> x -> (x, typeof(x))
(Year(11), Year)

DataFrame handling

I've set the priority of datetime, timedelta, datetime64 and timedelta64 to ARRAY, which allows for automatic Table conversion - I hope that's the intended way to do it.

julia> jdf = DataFrame(x = [now() + Second(rand(1:1000)) for _ in 1:100], y = [Second(n) for n in 1:100])
100×2 DataFrame
 Row │ x                        y
     │ DateTime                 Second
─────┼──────────────────────────────────────
   1 │ 2024-06-17T00:31:31.236  1 second
   2 │ 2024-06-17T00:30:30.236  2 seconds
   3 │ 2024-06-17T00:41:22.236  3 seconds
  ⋮  │            ⋮                  ⋮
  98 │ 2024-06-17T00:36:05.236  98 seconds
  99 │ 2024-06-17T00:38:38.236  99 seconds
 100 │ 2024-06-17T00:28:21.236  100 seconds
                             94 rows omitted

julia> pdf = pytable(jdf)
Python:
                         x               y
0  2024-06-17 00:31:31.236 0 days 00:00:01
1  2024-06-17 00:30:30.236 0 days 00:00:02
2  2024-06-17 00:41:22.236 0 days 00:00:03
3  2024-06-17 00:33:52.236 0 days 00:00:04
           ... 4 more lines ...
97 2024-06-17 00:36:05.236 0 days 00:01:38
98 2024-06-17 00:38:38.236 0 days 00:01:39
99 2024-06-17 00:28:21.236 0 days 00:01:40

[100 rows x 2 columns]

julia> DataFrame(PyTable(pdf))
100×2 DataFrame
 Row │ x                        y
     │ DateTime                 Compound…
─────┼───────────────────────────────────────────────
   1 │ 2024-06-17T00:31:31.236  1 second
   2 │ 2024-06-17T00:30:30.236  2 seconds
   3 │ 2024-06-17T00:41:22.236  3 seconds
  ⋮  │            ⋮                      ⋮
  98 │ 2024-06-17T00:36:05.236  1 minute, 38 seconds
  99 │ 2024-06-17T00:38:38.236  1 minute, 39 seconds
 100 │ 2024-06-17T00:28:21.236  1 minute, 40 seconds
                                      94 rows omitted

Default Conversion

I chose to use Dates.CompoundPeriod as result type of default conversion from timedelta64 as both types support year, month and minor period units. This is debatable, we could also change it to Period, hence the resulting type would depend on the input.

julia> pyconvert(Any, x) |> x -> (x, typeof(x))
(11 years, Dates.CompoundPeriod)

julia> pyconvert(Period, x) |> x -> (x, typeof(x))
(Year(11), Year)

Both Python and Julia do not convert between Year/Month and the other period types, so there is no danger with this choice to arrive at ill-determined intervals. The difference is that Julia allows addition/subtraction of mixed types while Python/Numpy throws an error.

The difference to the previous PR is that all conversions rely on either builtin or numpy functions and do not use pandas.

Ordering of arguments for pytimedelta() was chosen to be identical to the python version, while ordering for pytimedelta64() is strictly descending, except week which comes last.

EDIT: add comments in what's new code, added conversions EDIT2: removed comment about datetime_data, I had misunderstood the meaning and have updated the code

hhaensel commented 1 day ago

@cjdoris what do you think about this PR?