Open CameronBieganek opened 6 months ago
Is this really a bug? This issue can happen whenever you have a generic function that temporarily causes its arguments to be in an invalid state - because it's always a possibility that the later operations that restore the correct state to the argument throws an error. So there are only two possibilities, really:
The first option is quite extreme and has far-reaching implications.
I don't know if there is a generally accepted meaning for filtering a sparse vector/array, but the two options that seem reasonable are 1) zero out the entries for which the predicate returns false, or 2) return a dense array with the entries for which the predicate returns true. It turns out that even filter
does not do either of the two reasonable things:
julia> x = sparsevec(1:2:7, 1:4, 8)
8-element SparseVector{Int64, Int64} with 4 stored entries:
[1] = 1
[3] = 2
[5] = 3
[7] = 4
julia> filter(iseven, x)
6-element SparseVector{Int64, Int64} with 2 stored entries:
[2] = 2
[5] = 4
Note that the indices get messed up and the sparse vector changes from length 8 to length 6.
generic function
The root issue here is the underspecification of interfaces in Julia. The filter
and filter!
functions are not as generic as they claim to be:
julia> x = sparsevec(1:2:7, 1:4, 8);
julia> @which filter(iseven, x)
filter(f, a::AbstractArray)
@ Base array.jl:2617
julia> @which filter!(iseven, x)
filter!(f, a::AbstractVector)
@ Base array.jl:2651
They do not work for arbitrary instances of AbstractVector
. (This is how I came across this issue---I've been exploring interface questions.)
Perhaps it would make sense to add ::SparseVector
methods for filter
and filter!
? That doesn't solve the interface issue, but at least it covers one more AbstractVector
case.
Is this really a bug? This issue can happen whenever you have a generic function that temporarily causes its arguments to be in an invalid state - because it's always a possibility that the later operations that restore the correct state to the argument throws an error.
To more specifically respond to your question, I would expect the following to work, but it does not:
julia> x = sparsevec(1:2:7, 1:4, 8)
8-element SparseVector{Int64, Int64} with 4 stored entries:
[1] = 1
[3] = 2
[5] = 3
[7] = 4
julia> x[3]
2
julia> try
filter!(iseven, x)
catch
end
julia> x[3]
0
But perhaps that expectation is unreasonable. 🤷♂️ Maybe I just have to accept the fact that Julia is a procedural language with lots of side effects.
Regarding filter(f, ::SparseVector)
, I don't understand why you would expect it to give you a dense vector, or zero out some entries. That's not how filter
works, and it's not how it's documented to work. It's documented to return a copy of the collection without elements for which f
returns false
, and that's exactly what it does. From what I can tell, filter
behaves correctly.
Regarding the try-catch thing, that's not possible to have that work in general. For that to work, there would need to be some enforcement that instances of some abstract type implements some given methods. There has been some discussion about this, but it's not part of Julia yet.
But perhaps that expectation is unreasonable. 🤷♂️ Maybe I just have to accept the fact that Julia is a procedural language with lots of side effects.
I mean
isn't that the whole point of distinguishing filter
from filter!
? you can get the behavior you want by just
x2 = copy(x)
filter!(iseven, x2)
x = x2
It's documented to return a copy of the collection without elements for which
f
returnsfalse
, and that's exactly what it does.
No, it mangles the keys:
julia> x = sparsevec(1:2:7, 1:4, 8)
8-element SparseVector{Int64, Int64} with 4 stored entries:
[1] = 1
[3] = 2
[5] = 3
[7] = 4
julia> filter(iseven, x)
6-element SparseVector{Int64, Int64} with 2 stored entries:
[2] = 2
[5] = 4
Compare that to filter
on a Dict
, which preserves the keys:
julia> filter(p -> iseven(p[2]), Dict(:a => 1, :b => 2))
Dict{Symbol, Int64} with 1 entry:
:b => 2
I know it's a little different because dictionaries iterate pairs and sparse vectors just iterate elements, but in my opinion a sane filter
on a sparse vector should preserve the keys.
The filter
docstring says
Return a copy of collection
a
, removing elements for whichf
isfalse
.
But that's a fundamentally undefined sentence in Julia, because there is no definition of collection
. The object returned by filter(iseven, x)
is a pretty useless "copy" of the sparse vector x
, since the indices of the values have changed.
I thought I was an experienced Julia programmer, but apparently I have not been admitted to the enlightened circle of Julia developers who understand the true meaning of Julia's vague docstrings.
you can get the behavior you want by just
x2 = copy(x) filter!(iseven, x2) x = x2
That obviously doesn't work:
julia> x = sparsevec(1:2:7, 1:4, 8)
8-element SparseVector{Int64, Int64} with 4 stored entries:
[1] = 1
[3] = 2
[5] = 3
[7] = 4
julia> x2 = copy(x); filter!(iseven, x2)
ERROR: MethodError: no method matching deleteat!(::SparseVector{Int64, Int64}, ::UnitRange{Int64})
Closest candidates are:
deleteat!(::Vector, ::AbstractUnitRange{<:Integer})
@ Base array.jl:1525
deleteat!(::Vector, ::AbstractVector)
@ Base array.jl:1567
deleteat!(::Vector, ::Any)
@ Base array.jl:1566
...
Stacktrace:
[1] filter!(f::typeof(iseven), a::SparseVector{Int64, Int64})
@ Base .\array.jl:2662
[2] top-level scope
@ REPL[29]:1
julia> x = x2; x
8-element SparseVector{Int64, Int64} with 6 stored entries:
[1] = 0
[2] = 2
[3] = 0
[4] = 0
[5] = 4
[7] = 4
The indices of the values change, because the removed elements are removed, not set to zero. If you remove e.g. the element at index 2 from a vector, then what was previously at element 3 will move down to be in position 2. This is the same as an ordinary Vector. A sparse vector is not like a Dict - it's rather an alternative implementation of something like Vector. If the indices were preserved, then no elements would have been removed, which would have been wrong according to the docstring. In other words, the elements not printed when you display the sparse vector are not missing elements. They exist, but are zero. This is different from a Dict.
That obviously doesn't work:
I stand corrected -- I should have tried it before posting
but apparently I have not been admitted to the enlightened circle of Julia developers who understand the true meaning of Julia's vague docstrings.
I apologize for having implied any such thing
w.r.t. to the example at hand
julia> x = sparsevec(1:2:7, 1:4, 8)
8-element SparseVector{Int64, Int64} with 4 stored entries:
[1] = 1
[3] = 2
[5] = 3
[7] = 4
julia> filter(iseven, x)
6-element SparseVector{Int64, Int64} with 2 stored entries:
[2] = 2
[5] = 4
I agree this is pretty confusing, and I had to do a double take. although after staring at it a bit it does seem correct. what was tripping me up was forgetting that of course in this case the iseven
predicate includes the zeros. The behavior seems a little bit more clear when I make the predicate exclude zeros
julia> filter(v -> iseven(v) && !iszero(v), x)
2-element SparseVector{Int64, Int64} with 2 stored entries:
[1] = 2
[2] = 4
I agree this is a bug — it's not just sparse, this method will corrupt all vectors that are mutable but not mutably-sized. And SparseVectors do not have a mutable length, so they cannot implement this even if they wanted to (they're struct
s).
There are three classes of functions:
deleteat!
, insert!
, push!
, pushfirst
, pop!
, popat!
, popfirst!
, empty!
, splice!
AbstractVector
but would immediately fail (because they effectively resize!
first): append!
, prepend!
filter!
, intersect!
, union!
, symdiff!
(but I think the latter three are all built atop filter!)I may have missed a few, but this isn't an abstract "yeah, anything can fail in the middle" sort of thing. We know that many AbstractVectors
are mutable but not mutable-length. I think all these methods should check for applicability of what they expect to use to resize!
before doing any preparatory work.
The indices of the values change, because the removed elements are removed, not set to zero. If you remove e.g. the element at index 2 from a vector, then what was previously at element 3 will move down to be in position 2. This is the same as an ordinary Vector.
Ok, I can agree with this logic, which explains the filter
behavior, but I still think that the filter!
issue is due to the fact that the AbstractVector
interface is too coarse to represent the kinds of vectors for which the current implementation actually works. (What I mean is not that the AbstractVector
interface needs more mandatory methods added to it, but that in an ideal world we would have a sub-interface, with additional methods in it, that describes the expected interface for vector-like arguments to filter!
.)
I apologize for having implied any such thing
Sorry, I was being a bit hyperbolic, and I was not implicating you. 😉
If you try to use
filter!
on aSparseVector
, it fails with an internal method error fordeleteat!
. And then your sparse vector is corrupted: