sum!, prod!, any!, and all! may silently return incorrect results

yurivish commented 3 years ago

I noticed that sum! and prod! do not check for aliasing, leading to incorrect results when the same array is passed in both arguments. The incorrect behavior under aliasing is not documented and the result is a silent unexpected wrong result.

julia> a = [1 2 3];

julia> sum!(copy(a), a)
1×3 Matrix{Int64}:
 1  2  3

julia> sum!(a, a)
1×3 Matrix{Int64}:
 0  0  0

julia> a = [1 2 3];

julia> prod!(copy(a), a)
1×3 Matrix{Int64}:
 1  2  3

julia> prod!(a, a)
1×3 Matrix{Int64}:
 1  1  1

Not all mutating functions have this problem and cumsum! can be used this way without error:

julia> a = [1, 2, 3];

julia> cumsum!(a, a)
3-element Vector{Int64}:
 1
 3
 6

Other functions correctly flag the aliasing in the case of === arguments and throw an error:

julia> using LinearAlgebra

julia> a = rand(2, 2);

julia> mul!(a, a, a)
ERROR: ArgumentError: output matrix must not be aliased with input matrix

The examples above were generated on Julia 1.6.0-beta1 and I see the same behavior in version 1.5.3.

Edit: I noticed this issue also applies to any! and all!:

julia> let a = [true, false]
           any!(a, a)
       end
2-element Array{Bool,1}:
 0
 0

julia> let a = [true, false]
           any!(copy(a), a)
       end
2-element Array{Bool,1}:
 1
 0

julia> let a = [true, false]
           all!(a, a)
       end
2-element Array{Bool,1}:
 1
 1

julia> let a = [true, false]
           all!(copy(a), a)
       end
2-element Array{Bool,1}:
 1
 0

barucden commented 3 years ago

So what is the correct approach here? Add an alias check similar to that in LinearAlgebra.mul!? If so, it might be enough to add the check here https://github.com/JuliaLang/julia/blob/master/base/reducedim.jl#L908.

mcabbott commented 3 years ago

Maybe worth noting that

julia> sum!(a, a; init=false)
1×3 Matrix{Int64}:
 2  4  6

This is Base.mapreducedim!(identity, +, Base.initarray!(a, Base.add_sum, false, a), a), and the difference is whether initarray! does anything. (prod! and all! etc. come to the same place.)

StefanKarpinski commented 3 years ago

Can @mbauman's Base.mightalias help here?

mbauman commented 3 years ago

I mean, in all these cases it needs to collapse to a no-op. Can we avoid worrying about aliasing at all with a simple size(out) == size(in) && copyto!(out, in)?

mcabbott commented 3 years ago

I'm not sure what should happen, but a tricker example is:

julia> mat = [100 2 3; 4 5 6];

julia> sum!(view(mat,:,1), mat);

julia> mat
2×3 Matrix{Int64}:
  5  2  3
 11  5  6

The perhaps-comparable @views mul!(mat[:,[1,3]], mat[:,2:3], mat[:,1:2]) runs without complaint.

nalimilan commented 2 years ago

@mbauman You're right that calling size(out) == size(in) && copyto!(out, in) is probably a good idea anyway, but as @mcabbott showed this is not enough to catch all problematic cases. Do you think we should call Base.mightalias then? And should we error or make a copy of the input if it aliases the output?

At any rate, docstrings should mention this.

adienes commented 1 year ago

should this bug be release blocking at any point? I feel that it is

surprising
easy to encounter
uses only "basic" functions and objects

oscardssmith commented 1 year ago

at the very least, we could document that this is a case where weirdness can occur

ufechner7 commented 1 year ago

Please, add this to the 1.10 milestone... This is a correctness issue, not so difficult to fix, and correctness should have priority for any function in Base or the standard library.

nlw0 commented 1 year ago

should this bug be release blocking at any point? I feel that it is
1. surprising

2. easy to encounter

3. uses only "basic" functions and objects

I beg to differ. Mutable functions are prone to weird issues like that, and that's one reason to avoid them or be careful with their use. Good thing we name them with a scary warning ! character. Behavior when calling sum!(a,a) should be undefined, in my opinion. Furthermore, anyone using such a function is probably more concerned with speed than avoiding a result like that. Adding checks go against ultimate speed.

Completely agree it should be documented. But in my opinion this is a bit like expecting a specific defined behavior from reduce(-, 1:10).

KristofferC commented 1 year ago

Why is this on the milestone? The milestone is not something you just slap on issues and they get magically fixed. Is this a regression vs 1.9?

ufechner7 commented 1 year ago

Well, there is even a pull request already that might fix this issue... Not good for the reputation of Julia to delay this any further... See: https://discourse.julialang.org/t/did-julia-community-do-something-to-improve-its-correctness/102515/21

adienes commented 1 year ago

forgive the naivete, but what even causes this? why does aliasing matter in the output array? that is, what makes

sum!(a, a)

so different from

a .= sum.(a)

nlw0 commented 1 year ago

forgive the naivete, but what even causes this? why does aliasing matter in the output array? that is, what makes
sum!(a, a)
so different from
a .= sum.(a)

Any reducing function such as sum, in a mutable version, should feel free to use the output parameter as "scratch space" in my opinion. If you provide the pre-allocated output/scratch space as one of the inputs, you should bear the consequences. Should be undefined behavior. It's a very different situation from a mapping functor where the values are going to be consumed and immediately replaced by the output.

In this example, the non-mutating call to sum there will be a (stack allocated?) accumulator, and the broadcasting will allocate new memory for the output, and then the data will be copied from this new anonymous array into a. When using immutable functions, we assume new memory will be allocated as necessary. Mutable functions could have different guarantees about what happens to the parameters, of course. I personally think in the case of sum! it should be fair game for the implementation to alter the values during processing, with no guarantees that it only writes there in the last step. That's my opinion for this case at least.

ufechner7 commented 1 year ago

Any reducing function such as sum, in a mutable version, should feel free to use the output parameter as "scratch space" in my opinion. If you provide the pre-allocated output/scratch space as one of the inputs, you should bear the consequences. Should be undefined behavior. It's a very different situation from a mapping functor where the values are going to be consumed and immediately replaced by the output.

In this example, the non-mutating call to sum there will be a (stack allocated?) accumulator, and the broadcasting will allocate new memory for the output, and then the data will be copied from this new anonymous array into a. When using immutable functions, we assume new memory will be allocated as necessary. Mutable functions could have different guarantees about what happens to the parameters, of course. I personally think in the case of sum! it should be fair game for the implementation to alter the values during processing, with no guarantees that it only writes there in the last step. That's my opinion for this case at least.

I do not agree. Functions in base should be as safe and correct as possible. If there would be a large performance penalty to add a check I would accept your argument, but if the average performance penalty is less than 10% we should add a check. Not all users of Julia have a PhD. Better safe than sorry. Julia should try to be as fast and safe as Rust and not as fast and unsafe as C. Unsafe high performance versions of standard functions can live in packages, or an enabling them via a macro might also be an option.

mikmoore commented 1 year ago

I don't see this as remotely easy to "fix" in code. Note that our mul! also "stumbles" on this:

julia> using LinearAlgebra

julia> x = ones(1,1); mul!(x,x,x)
ERROR: ArgumentError: output matrix must not be aliased with input matrix
Stacktrace:
 [1] gemm_wrapper!(C::Matrix{Float64}, tA::Char, tB::Char, A::Matrix{Float64}, B::Matrix{Float64}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra ~/julia-1.9.1/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:650

julia> mul!(x,x,x') # silent failure
1×1 Matrix{Float64}:
 0.0

BLAS is clearly not checking for aliasing internally. We have a courtesy check (that threw the initial error, based on checking for C === A || B === C), but as the last line shows we don't check every simple possibility, much less anything more elaborate.

I say mul! "stumbles" instead of "fails" because the docstring indicates (emphasis on the final sentence)

help?> mul!
search: mul! rmul! lmul! accumulate! muladd widemul accumulate module Module __module__ 'mutable struct' @__MODULE__

  mul!(Y, A, B) -> Y

  Calculates the matrix-matrix or matrix-vector product AB and stores the result in Y, overwriting the existing
  value of Y. Note that Y must not be aliased with either A or B.

Note that fixing these problems, in general, requires that we either throw an error or silently copy inputs (but some functions also need to observably mutate "inputs", so that solution could not work everywhere). Neither of those is what the user wanted, although I expect they would prefer the error to unspecified behavior. Even when it didn't cause deeper correctness problems, I can't promise the user would be happy we started copying stuff for them in a function they specifically called to avoid allocations.

We can do people the favor of courtesy checks, but it seems that these will inevitably throw on legal uses and/or miss illegal ones. We can try to catch "easy and obvious" violations (like the mul! example above did), but those should not be be relied upon and (IMO) should not come at any non-trivial run-time cost.

To my point on reliability, I would say Base.mightalias might be part of these courtesy checks except that it is documented to be conservative so could cause unnecessary and difficult-to-sidestep errors on totally acceptable uses. For example, it currently triggers on

julia> y = randn(4,4); y1 = view(y,[1,3],:); y2 = view(y,[2,4],:); Base.mightalias(y1,y2)
true

Documenting the no-alias assumption appears to be the only complete solution. It would be worth auditing all of our ! functions for undocumented no-alias requirements. This fix just requires adding similar notes to mul!'s Note that Y must not be aliased with either A or B.

ufechner7 commented 1 year ago

Documenting the no-alias assumption appears to be the only complete solution.

Well, documenting is always good, but adding an incomplete solution is better than implementing no solution. If an incomplete solution makes it less likely that a newcomer hits this issue we should do it.

KristofferC commented 1 year ago

Not good for the reputation of Julia to delay this any further...

Again, the milestone is not for getting random things fixed faster. I'll remove it since it doesn't seem to be a release blocker.

ufechner7 commented 1 year ago

This is bad and shows the attitude of leading team members to ignore correctness issues...

gdalle commented 1 year ago

Started PR #50824 to amend the docs because it doesn't cost much, even if we later decide to check for some forms of aliasing.

Personally, I agree with @mikmoore that some edge cases will necessarily fall through the cracks. And I also don't think that comments like

This is bad and shows the attitude of leading team members to ignore correctness issues...

will lead anywhere. If anything, the fact that we're here discussing this shows that we all take it to heart, although opinions may diverge.

gbaraldi commented 1 year ago

I agree, and potentially we could add those checks and even make them smarter (we potentially could and maybe should check for derived arrays like ranges or transposes because we can look for the parent) but the user is always able to get around those so the docs should reflect that.

nlw0 commented 1 year ago

I do not agree. Functions in base should be as safe and correct as possible. If there would be a large performance penalty to add a check I would accept your argument, but if the average performance penalty is less than 10% we should add a check. Not all users of Julia have a PhD. Better safe than sorry. Julia should try to be as fast and safe as Rust and not as fast and unsafe as C. Unsafe high performance versions of standard functions can live in packages, or an enabling them via a macro might also be an option.

sum! is not really a function, it's a procedure, it's mutable. I love functional programming, the strictness and rigueur of Haskell and Scala. This strict FP view should be directed to functions such as sum. There's a well-known design principle behind procedures such as sum!: they are supposed to enable pre-allocation of arrays, and the priority is performing reductions of an array B and store the result in a. If the implementation happens to challenge some corner case, that's interesting and maybe we should think about the implications. But by no means does that immediately imply the procedure is "incorrect". Maybe this mutable function just happens to have a reduced domain compared to a pure function, and it's still useful and does exactly what it's intended to do. In my opinion, any calls to procedures that can mutate a parameter should be considered undefined if the same parameter (or view or whatever) is present in the other inputs. And that's just reasonable if we are talking about having a strict mathematical understanding of what operation is being implemented: strict mathematical understandings precludes mutation.

Maybe it's the case that these mutable reductions could be implemented in a way that is efficient and guarantee to behave like the identity function when the inputs have the same shape, even in the unreasonable case where the same array is the input and the output. In fact, the assignment to 0 in the summation sounds a little inefficient. I'll take that extra feature if I can have it. I'm not convinced this should be a general constraint when deciding what these procedures should do, though, and I believe it's likely to introduce complications in other instances.

I find it pretty upsetting that so much attention is being given to this, while issue #45000, that I still consider very serious and relevant, was dismissed out of hand.

mkitti commented 1 year ago

I find it pretty upsetting that so much attention is being given to this, while issue #45000, that I still consider very serious and relevant, was dismissed out of hand.

There you were directed to mapfoldl which is also referred to in the documentation. Could you explain in #45000 why that is not a resolution to that issue?

nlw0 commented 1 year ago

I find it pretty upsetting that so much attention is being given to this, while issue #45000, that I still consider very serious and relevant, was dismissed out of hand.

There you were directed to mapfoldl which is also referred to in the documentation. Could you explain in #45000 why that is not a resolution to that issue?

mapfoldl should be the default behavior for mapreduce and reduce. Just give me the same as I we would get with a simple for-loop, and get out of the way of important compiler optimizations tuned to this pattern. There should be no attempt to be "smart" about accuracy. My experiment demonstrates that accuracy can actually be better with the "naive" mapfoldl. The current implementation results in generally worse machine code generation, and it's not even the best for the intended specific purpose of better summation accuracy with floating-point. I can sympathize with the original intention, but it's just not a good idea. Especially if I'm potentially performing integer operations, for instance, and then the forced splitting of the data just doesn't make sense, while interfering in compiler optimizations.

StefanKarpinski commented 1 year ago

I don't really think that issue is particularly relevant to this one, but linking it here did call my attention to it.

PallHaraldsson commented 1 year ago

At least sum! isn't used by Julia itself so ideally should not/never have been part of Julia Base, but at least can then be "fixed" without slowing Julia down.

Any reducing function such as sum, in a mutable version, should feel free to use the output parameter as "scratch space" in my opinion.

Maybe, but all of those functions could be duplicated in an external package as is (right now, they could even be deprecated here), and people could do using aliasing_unsafe_sum, if a check is actually too costly to use such function from Base after modification:

I do not agree. Functions in base should be as safe and correct as possible. [..] Not all users of Julia have a PhD. Better safe than sorry.

I'm sympathetic to that, so should a check be added to at least the most important functions ASAP?

mikmoore commented 4 days ago

Closed by #50824? It was just a documentation fix but (as discussed up-thread) there isn't a clear path to a reliable code fix. I would say this has been addressed.

adienes commented 2 days ago

the documentation addition is an improvement to the status quo, but I would not say this issue is not yet fully addressed. I don't really have an opinion on closing or not, except that if closed it should be closed as "not planned" rather than as "completed"

mikmoore commented 2 days ago

if closed it should be closed as "not planned" rather than as "completed"

The issue is about silently returning wrong results. Now that there's an explicit warning in the docstring (for these functions and many more that weren't even part of this issue), I wouldn't classify it as "silently incorrect" any more than @fastmath (whose warnings are much more muted and whose failure modes are not clearly explained) or any unsafe_XYZ function.

Closing this as "not planned" doesn't seem right, especially since this is tagged "correctness bug." I don't think it's any issue that a function misbehaves when explicitly documented argument requirements are violated. If we want an issue for accurate and fast alias detection (which would be necessary to "100% close" this issue), that sounds like a separate feature request (and maybe a pipe dream).

mbauman commented 2 days ago

In the very specific case of in-place dimensional reductions, this is somewhat fixable. It's hard, but it's fixable. In fact, my WIP pull request that refactors dimensional reductions fixes the rather trivial examples above. I ran out of time implementing that, however. It's hard. I may be able to get back to it someday, but in the meantime, help is wanted.

https://github.com/JuliaLang/julia/pull/55318

It doesn't address the more general possibilities of aliasing with views and such in novel ways, of course. Being more principled about reduction initialization is definitely a good thing (see all the other issues that PR fixes), but it's only half of the problem. You also need to be able to promise which element will be read first — and that's something that sum! and friends explicitly don't promise — you need foldl! for that! Specifically, we obviously cannot support both of these without wrecking the reason-for-being of sum!:

A = rand(3,4)
@views sum!(A[:, 1], A) # store outputs in the first column — you gotta read the first column before the first store
@views sum!(A[:, 2], A) # store outputs in the second column — how could you know which index to read first!?

Interestingly, I think #55318's current implementation would allow you to reduce into the first index along any dimension(s), because that first index happens to be read first. It's also probably the only case where folks might reasonably expect this to work... and it's a pretty useful functionality. The huge elephant glaring at my screen over my shoulder here is, of course, asking if this should be a promise or if his name is simply Hyrum Wright.

mbauman commented 2 days ago

Perhaps more succinctly, at a nuts-and-bolts level, there are two causes for this problem:

The current implementation (often) writes an initializer value to the output before reading anything. Therefore, it will never work if the output aliases the input. Being more careful about initialization is a very good thing, though, and this should be changed. That's #55318. If and when that changes, however...
The family of reduce!, sum!, and friends don't promise which element will be read first. Therefore, in all but the trivial 1-element reductions that are flagged in the original post here (and likely 2-element reductions, too), you can't intentionally pick a "safe" output location that's promised to be read from first. But perhaps that puts us into an even more dangerous situation wherein some outputs are safe to alias and others are not. Even further, most reduction implementations avoid doing intermediate stores altogether if the reduction shape is cache-friendly; in those cases even more categories of views would be safe.

JuliaLang / julia

sum!, prod!, any!, and all! may silently return incorrect results #39385