elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

`Explorer.DataFrame.filter_with` doesn't work with empty list as result of the callback #1021

Closed LostKobrakai closed 1 day ago

LostKobrakai commented 1 day ago
** (ArgumentError) expecting the function to return a single or a list of boolean LazySeries, but instead it contains:
[]
    (explorer 0.10.0) lib/explorer/data_frame.ex:2656: Explorer.DataFrame.filter_without_boolean_series_error/1

It would be great to have this support an empty list as well (no filtering), but if that's not easily done than at least the typespec for the callback should be adjusted from

(Explorer.Backend.QueryFrame.t() -> Explorer.Series.lazy_t() | [Explorer.Series.lazy_t()])

to

(Explorer.Backend.QueryFrame.t() -> Explorer.Series.lazy_t() | [Explorer.Series.lazy_t(), ...])
billylanchantin commented 1 day ago

@LostKobrakai Thanks for the report!

I think we should accept an empty list of filters. I see no reason not to and it should be a super easy patch.

But first: are the semantics what I think they are? I'm thinking of situations like this where logic dictates certain outcomes for consistency:

Enum.any?([], fn _ -> true end)
#=> false

Enum.all?([], fn _ -> false end)
#=> true

Here, I believe this should be true in all cases:

df1 = DataFrame.new(...)
df2 = DataFrame.filter_with(df1, [])
assert df1 == df2

Is that right?

josevalim commented 1 day ago

That's right. A filter expects all conditions to be true and then the row is retained. So it is equivalent to all?.

LostKobrakai commented 1 day ago

Yeah, that would be my understanding. Not providing a filter means no condition to filter by means nothing is filtered away.

billylanchantin commented 1 day ago

@LostKobrakai Should be fixed on main now. Not sure when it'll be released but hopefully that unblocks you.

LostKobrakai commented 1 day ago

Thanks for the quick turnaround.

I worked around it for now appending a condition, which is always true for all records. So I might wait for the next release.