Open pdeffebach opened 5 years ago
One issue here is that filter
is eager and, generally speaking, typeof(filter(f, x)) == typeof(x)
. That isn't really possible to implement in a general case, but I think it would be okay to define filter
for SkipMissing
to materialize and filter.
So I guess in that proposal filter(::SkipMissing{<:AbstractArray})
would work, but not with other wrapped iterators?
That would make sense to me. SkipMissing
is an odd enough case itself that I think special treatment of it would be fine. We could define it as
filter(f, s::SkipMissing) = filter(x->!ismissing(x) && f(x), s.x)
then let filter
fail if the underlying iterator doesn't have filter
defined on it.
An alternative is that people could just change their filter commands to have ismissing
, which is a bit more explicit anyways.
This could just be a documentation / tutorial problem.
I was looking at implementing what I said above verbatim, but I realized that that definition leaves you with a container without any missing values but with an element type that allows missing. That seems suboptimal and is inconsistent with the result of filter(f, collect(s))
.
This is difficult to do generically, since ideally you'd want to be able to use filter
on any SkipMissing{T}
where T
supports filter
and get a T
back, but to provide useful and consistent behavior, I'm not sure how we'd swing it for anything but arrays...
Supporting filter
only for AbstractArray
sounds OK to me.
Implemented in #31235.
One possible rationale for supporting filter(::SkipMissing{<:AbstractArray})
and not other methods is that SkipMissing{<:AbstractArray}
is indexable. If an Indexable
trait was added as discussed at https://github.com/JuliaLang/julia/pull/31020#discussion_r255292414, it would make sense for filter
to support all types with that trait. That would include in particular Broadcasted
objects.
We can't support filter
for arbitary iterables and we now have filter
for SkipMissing
-wrapped arrays, so is there anything actionable left or can this be closed now that #31235 is merged?
As I noted it could make sense to also support Broadcasted
. Not sure whether that warrants keeping this open.
Ah right, sorry. It might make sense to have that as a separate issue, since it's very specific (and a difficult situation to find yourself in, since presumably a broadcast will materialize before filter
is called).
Broadcasted would materialize an array while lift
ing missing elements? Would be great!
Yes that's in the context of https://github.com/JuliaLang/julia/pull/31088.
I just ran into this trying to filter a zip
, which also doesn't work.
filter(x->x[2] > 0.5, zip(rand(10), rand(10)))
Is the main objection to having filter
work for arbitrary iterators that we can't then guarantee that typeof(filter(f, x)) == typeof(x)
? It doesn't seem that map
makes that guarantee:
map(sum, zip(rand(10), rand(10)))
This returns a Vector{Float64}
. It seems like returning a Vector{eltype(x)}
in the case that x
is an iterator seems like a pretty reasonable behavior.
It doesn't seem that
map
makes that guarantee
Sure, but that's not the same operation. map
is explicitly a transformation of the data into something different, while filter
gives you back a subset of the existing data.
It seems like returning a
Vector{eltype(x)}
in the case thatx
is an iterator seems like a pretty reasonable behavior
The problem is defining what an "iterator" is. For example, one can iterate a tuple, but I'd be surprised if e.g. filter(isodd, (1,2,3))
gave me back an array.
The problem is defining what an "iterator" is. For example, one can iterate a tuple, but I'd be surprised if e.g.
filter(isodd, (1,2,3))
gave me back an array.
But why would it return an array? Just like map
returns a tuple for tuples, filter
could fall back to an array for general iterators, but return another type for some iterable types.
Sure, but that's not the same operation. map is explicitly a transformation of the data into something different, while filter gives you back a subset of the existing data.
map
transforms the individual elements, so I'd expect that the eltype
would change, but that seems separate from whether if affects the container type.
This issue is motivated by this discussion on discourse.
My example of when this is a problem is when you are working with a subset of data that, by chance, does not contain missing values. When you work with the full data, you now have missing values and have to use
skipmissing
at various places in your code. Now all yourfilter
commands are broken.A heavy and breaking change would be to make
Base.filter
work on any iterator, and not just arrays. Other alternatives include pointing more people towardsIterators.filter
for general use, or special-casingskipmissing
so acollect
call is included somewhere.