Closed mlineen closed 9 months ago
Hi @mlineen,
Thanks for the report!
Not sure if this is a bug or a feature
This looks like a bug to me. I didn't see any notes in the Polars release about changing the behavior of count
. And AFAICT, the behavior should match that of Explorer.Series.count/1
. It does not since:
import Explorer.Series
[1, nil, 3] |> from_list() |> count() #=> 3
I'll try to dig more into it tonight to make sure.
Interesting! So this isn't a "bug" per se. As of Polars 0.36.2, this is the default behavior. From the release:
💥 Breaking changes
- Update Expr.count to ignore null values by default (https://github.com/pola-rs/polars/pull/12934)
We now need to use a different function under the hood if we want the old behavior.
WDYT @philss, @josevalim, @cigrainger?
@billylanchantin this is tricky. count
in SQL does not include nulls indeed. And there is also a chance the behaviour of Series.count
outside of a lazy query does not handle nils. So probably what we need to do is:
Series.count
always discard nils, both inside and outside lazy queriesSeries.size
return the whole size. Inside groups, it should return the size of each groupWDYT?
I think that's a good plan. We'll want to call out that count
(now) works like it does in SQL in the docs.
Will we also need to tackle Series.size
not being available in lazy series? We may be able to make that work with len
:
@billylanchantin this is tricky.
count
in SQL does not include nulls indeed. And there is also a chance the behaviour ofSeries.count
outside of a lazy query does not handle nils. So probably what we need to do is:* Make `Series.count` always discard nils, both inside and outside lazy queries * Make `Series.size` return the whole size. Inside groups, it should return the size of each group
WDYT?
@josevalim is this an accurate summary of your proposal?
Explorer Series.count
= Polars Series count
Explorer Series.size
≈ Polars Series len
(≈ because I'm not sure what Polars behavior is regarding groups)
(I work with @mlineen)
Yes!!
Yep I agree with that proposal!
Not sure if this is a bug or a feature, but when a
nil
is present in the series,summarise
withcount
skips thenil
(this behavior changed from 0.7.2 to 0.8.0):