elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

Add `keep: :none` argument to mutate functions #966

Closed brendon9x closed 3 months ago

brendon9x commented 3 months ago

This PR adds a new keep option to the mutate and mutate_with functions in the DataFrame module. This option allows users to control which columns are retained in the output DataFrame after a mutation operation. Closes #965

Examples of usage

Regular DataFrame:

df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
Explorer.DataFrame.mutate(df, [c: b + 1], keep: :none)
#Explorer.DataFrame<
  Polars[3 x 1]
  c s64 [2, 3, 4]
>

Grouped DataFrame:

df = Explorer.Datasets.iris()
grouped = Explorer.DataFrame.group_by(df, "species")
Explorer.DataFrame.mutate(grouped, [petal_length_avg: mean(petal_length)], keep: :none)
#Explorer.DataFrame<
  Polars[150 x 2]
  Groups: ["species"]
  species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
  petal_length_avg f64 [1.464, 1.464, 1.464, ...]
>

Considerations

  1. I decided to use a keyword argument (:all or :none) instead of a boolean for maximum forwards compatibility.
  2. I was going to add unit tests, but the doctests seemed sufficient to cover the new functionality and its interactions with existing features.
josevalim commented 3 months ago

:green_heart: :blue_heart: :purple_heart: :yellow_heart: :heart:

brendon9x commented 3 months ago

@cigrainger @josevalim – this repo was a joy to work in. The contributor docs were great and the mix tasks worked flawlessly (and tests are super fast). I wouldn't have been able to push this over the line if any of that wasn't true. Thanks for the awesome project.