elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.05k stars 112 forks source link

cast datetime #729

Closed sehHeiden closed 6 months ago

sehHeiden commented 8 months ago

Want to cast:

S.cast(df["date_string"], {:datetime, :millisecond})

Input is:

#Explorer.Series<
  Polars[4563]
  string ["2023-08-29T17:39:43", "2023-08-29T17:20:09", "2023-08-29T16:53:00",
   "2023-08-29T15:38:00", "2023-08-29T16:56:49", "2023-08-29T16:30:13", "2023-08-29T16:21:20",
   "2023-08-29T15:49:26", "2023-08-29T15:48:53", "2023-08-29T15:01:57", "2023-08-29T14:56:54",
   "2023-08-29T14:55:55", "2023-08-29T14:52:09", "2023-08-29T14:17:28", "2023-08-29T14:09:57",
   "2023-08-29T13:53:19", "2023-08-29T13:48:23", "2023-08-29T13:17:42", "2023-08-29T12:53:17",
   "2023-08-29T06:15:00", "2023-08-29T11:07:01", "2023-08-28T13:23:17", "2023-08-27T15:50:36",
   "2023-08-26T15:47:56", "2023-08-26T13:09:13", "2023-08-21T12:28:48", "2023-08-16T17:33:10",
   "2023-08-12T06:25:07", "2023-08-08T17:29:40", "2023-08-05T13:41:05", "2023-08-01T05:56:35",
   "2023-07-31T19:35:24", "2023-07-30T06:39:49", "2023-07-20T11:35:10", "2023-07-19T17:24:04",
   "2023-07-14T09:02:04", "2023-07-07T09:55:04", "2023-07-07T09:35:13", "2023-07-05T16:44:39",
   "2023-07-05T16:42:05", "2023-08-27T20:13:19", "2023-08-25T19:35:33", "2023-08-21T08:29:36",
   "2023-08-15T19:39:36", "2023-08-07T20:21:36", "2023-07-06T22:01:22", "2023-04-12T13:33:31",
   "2023-01-01T13:32:50", "2023-08-29T07:55:27", "2023-08-29T05:41:04", ...]

I get:

#Explorer.Series<
  Polars[4563]
  datetime[ms] [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil,
   nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil,
   nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, ...]

To use a Enum and cast with NaiveDateTime.from_iso6801 works:

dt = S.to_list(df["date_string"])
  |> Enum.map(&elem(NaiveDateTime.from_iso8601(&1), 1))
  |> S.from_list()

toots_df = DF.put(df, "date", dt)
billylanchantin commented 8 months ago

strptime will work in this scenario if you provide the string format:

["2023-08-29T17:39:43", "2023-08-29T17:20:09"]
|> S.from_list()
|> S.strptime("%Y-%m-%dT%H:%M:%S")
# #Explorer.Series<
#   Polars[2]
#   datetime[μs] [2023-08-29 17:39:43.000000, 2023-08-29 17:20:09.000000]
# >

I'll leave this issue open for now. cast should probably not have this responsibility since datetime strings can vary so much. However, there might be an exception for iso8601 formatted strings.

Thoughts, Explorer team?

josevalim commented 8 months ago

Yes, I think we should make it work. The documentation says it does but I assume it stopped working at some point.

Thoughts, Explorer team?

@cigrainger, @philss and I would like to invite you to the team, given your fantastic contributions. No strings attached. :)

billylanchantin commented 8 months ago

Yes, I think we should make it work. The documentation says it does but I assume it stopped working at some point.

Oh nice! Where's it documented? I looked here but I didn't see it.

@cigrainger, @philss and I would like to invite you to the team, given your fantastic contributions. No strings attached. :)

I'm honored, I accept! I look forward to being a part of this all star cast :)

josevalim commented 8 months ago

In strptime we mention that "cast(..., :datetime)" will guess. I guess we should probably implement it by calling to_datetime if we know the source is dtype=string. (similar for date and perhaps time dtypes).

billylanchantin commented 8 months ago

Haha oh golly it was in the function that I linked (you sure you want me on this team? 😉). Yes I think that means this was a regression of some sort.

I guess we should probably implement it by calling to_datetime if we know the source is dtype=string. (similar for date and perhaps time dtypes).

Agreed.

billylanchantin commented 6 months ago

Closed by https://github.com/elixir-explorer/explorer/pull/795