elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

add support for map as expression #855

Closed lkarthee closed 9 months ago

lkarthee commented 9 months ago

Add struct equivalent

iex(2)> df = DF.new(%{a: [1, nil, 3], b: ["a", "b", nil]})
#Explorer.DataFrame<
  Polars[3 x 2]
  a s64 [1, nil, 3]
  b string ["a", "b", nil]
>
iex(3)> DF.mutate(df, c: %{a: a, b: b, lit: 1, null: is_nil(a)})
#Explorer.DataFrame<
  Polars[3 x 3]
  a s64 [1, nil, 3]
  b string ["a", "b", nil]
  c struct[4] [
    %{"a" => 1, "b" => "a", "lit" => 1, "null" => false},
    %{"a" => nil, "b" => "b", "lit" => 1, ...},
    %{"a" => 3, "b" => nil, ...}
  ]
>

Note: Hiding original text as it is stale as per https://github.com/elixir-explorer/explorer/pull/855#issuecomment-1937523811

Original text Add `struct` expression. ```elixir df = DF.new(%{a: [1, 2, 3], b: ["a", "b", "c"]}) #Explorer.DataFrame< Polars[3 x 2] a s64 [1, 2, 3] b string ["a", "b", "c"] > DF.mutate(df, c: struct([a: a, b: b])) #Explorer.DataFrame< Polars[3 x 3] a s64 [1, 2, 3] b string ["a", "b", "c"] c struct[2] [ %{"a" => 1, "b" => "a"}, %{"a" => 2, "b" => "b"}, %{"a" => 3, "b" => "c"} ] > Explorer.Series.struct(a: df["a"], b: df["b"]) #Explorer.Series< Polars[3] struct[2] [ %{"a" => 1, "b" => "a"}, %{"a" => 2, "b" => "b"}, %{"a" => 3, "b" => "c"} ] > ```
josevalim commented 9 months ago

Could we implement this without adding a struct function? Could we automatically convert maps to structs instead?

lkarthee commented 9 months ago

@josevalim currently %{} works in mutate as a top-level expression, but fails if it is input to any series function.

DF.mutate(df, c: %{a: a, b: b}) # works
DF.mutate(df, c: %{a: is_nil(a), b: is_nil(b)}) # works
DF.mutate(df, c: is_nil(%{a: a, b: b})) # fails

** (ArgumentError) expected a series as argument for is_nil, got: %{a: #Explorer.Series<
    LazySeries[???]
    s64 (column("a"))
  >, b: #Explorer.Series<
    LazySeries[???]
    s64 (column("b"))
  >}
    (explorer 0.9.0-dev) lib/explorer/series.ex:6127: Explorer.Series.apply_series/3

How to tackle this ?

josevalim commented 9 months ago

I am on my phone, but somewhere in lazy series we handle all literals, we should probably add map handling in there. The code will probably be pretty similar to the one you added to data frame, so we should probably find a way of sharing those as well.

josevalim commented 9 months ago

I took a Quick Look and I was wrong. We only allow casting in specific operations in series.ex. For example, we could begin supporting maps in the comparison operators, if comparison is supported between structs. Outside of that, we most likely won’t support passing maps. There may be an argument we should allow literal (such as integers and maps) on is_nil, but that’s probably not the case today

lkarthee commented 9 months ago

There are some convenient use cases of structs - https://docs.pola.rs/user-guide/expressions/structs/#practical-use-cases-of-struct-columns .

Should we support passing struct to a series ? These would add value to mutating, filtering without mutating, etc

josevalim commented 9 months ago

The question is: which operations should we support in on? For example, it doesn't make sense to support them on add or multiply. So I'd do operation per operation, at least initially.

lkarthee commented 9 months ago

Ok, let me explore more on this question and come back later.

I think this PR is complete for now.

josevalim commented 9 months ago

:green_heart: :blue_heart: :purple_heart: :yellow_heart: :heart: