Open Suavesito-Olimpiada opened 2 years ago
deprecating @inbounds
would be massively breaking. Unless you are willing to go through and make the change to a few thousand libraries, I think this is a non-starter.
I don't know the complete lexical used here, but by deprecation I meant it being usable yet, but just with a deprecation warning. :sweat_smile:
I would be ready to go through as many packages as I could changing this, as my summer break is really near. :stuck_out_tongue:
Is it possible to introduce @unsafe_inbounds
, without deprecating @inbounds
, but changing the manual and the docstring of @inbounds
to say
@inbounds
will be removed in a future version. Please use@unsafe_inbounds
instead.
Matlab does this sort of thing, at least.
The docstring for @inbounds
already warns as follows, and personally I don't think that changing the name to @unsafe_inbounds
will dissuade anyone from a risky usage more than this warning already does.
│ Warning
│
│ Using @inbounds may return incorrect results/crashes/corruption for out-of-bounds indices. The user is
│ responsible for checking it manually. Only use @inbounds when it is certain from the information locally
│ available that all accesses are in bounds.
That is true, but one person on Discourse said that didn't know about it being unsafe.
A nice design that Base
follows is being explicit on a lot of things, one of them is marking functions as unsafe when they are
I think that design should be followed here. Mostly because of consistency, but also because it could be educational in occasions where people don't know about this being unsafe. It is not so much about dissuading people from using it.
TL;DR disagree with "KristofferC added merge me and removed merge me" (the former part).
a) I'm not sure about changing this, then you would have two versions out there, @unsafe_inbounds
(clearly unsafe, as in C/C++, just explicit about it) and @inbounds
still unsafe, meaning the same but implying to some soething different.
b) And you also have ccall
(and @ccall
) which keep being unsafe. I think people will just have to learn the these idioms of the language. The user can turn off a) globally.
If some way of tackling OffsetArrays.jl warrants a new macro with different semantics, then I like to hear it (likely my proposal was flawed).
I think the best way forward here is to dramatically reduce our dependence on @inbounds
- it really has gotten out of hands. We need to improve the compiler to make sure that "obvious" patterns like:
for i = 1:length(A)
A[i]
end
don't emit bounds checks anymore. LLVM has most of the info that's needed for this - somebody needs to look at why that's not firing.
Once we've done that, it'll be much easier to come up with safer versions of @inbounds
, because the use across the ecosystem won't be as sprawling.
the best way forward here is to dramatically reduce our dependence on @inbounds
I think you're right, @inbounds
really feels like a hacky way to tell the compiler something that it should already know, at least for some types (Array
).
somebody needs to look at why that's not firing
Lamentably, I don't speak compiler. :'V
safer versions of
@inbounds
What would a safer version of inbounds looks like?
TL;DR disagree with "KristofferC added merge me and removed merge me" (the former part).
Huh? I fat fingered adding the label and then I removed it... It is also an issue, not a PR.
@Keno the compiler to make sure that "obvious" patterns like
Why would you assume it is an "obvious" pattern? It implicitly assumes that indexing starts with 1
, but no one tells that A
has valid index 1
. The whole point of the posted article is that the code you posted is unsafe with removed bounds checking. Can any compiler figure that out automatically?
One option would be to make the current @inbounds
a no-op, thereby disabling its many uses throughout the ecosystem, and introduce @unsafe_inbounds
that does what the current macro does. That would not break anything since ignoring inbounds is always valid and anything that errors when ignoring it was already broken. It would allow the ecosystem to incrementally review uses of the macro and either:
@unsafe_inbounds
, explicitly saying they’re still necessary.However, in order for this not to cause a widespread performance hit, it would be necessary to make sure the compiler can eliminate bounds checks on its own most of the time.
I don't think @inbounds
is ever going to go away completely, at the very least not while it's allowed to use any arbitrarily computed integers as indices. Consider this (admittedly contrived) example:
julia> using Base.Iterators
julia> function foo(A, n)
s = 0
idxs = 1:2
checkbounds(Bool, A, idxs) || error("not inbounds")
for idx in Iterators.filter(==(first(idxs)), flatten(first(repeated(idxs), length(idxs)*n)))
s += A[idx] == 1
end
return s
end
foo (generic function with 1 method)
All information required for this is right there, but if the compiler can ever figure out that this will always be inbounds, I'd be extremely impressed. At the very least, it would require formalizing the bounds checking interface further and have @inbounds
require that (I'd be very much in favor of this!).
As such, renaming or otherwise deprecating the current version feels like trying to cure a symptom instead of getting to the root of the problem: users not respecting the warnings & caveats described in its documentation. A name won't change that, if the macro is otherwise useful to them. Heck, Base.@pure
isn't even exported and explicitly says that its for internal use only. Still, people used it and complained when their code was broken due to misuse.
Once we've done that, it'll be much easier to come up with safer versions of @inbounds, because the use across the ecosystem won't be as sprawling.
Well unless people go back through old code and replace the @inbounds
with that safer version, I think we're stuck with it. :shrug: We could of course also go through the ecosystem and fix erroneous uses today..
@bvdmitri it's safe for the compiler to add the inbounds for types where it is valid (separate code gets compiled for different types).
A name won't change that
I think it might. It would for me, at least. If a function has unsafe
in its name, it makes me much more paranoid and careful about using it.
@StefanKarpinski:
One option would be to make the current @inbounds a no-op
My less radical proposal was closed (also at the package when I moved it there as followup):
https://github.com/JuliaLang/julia/issues/45353
I request you reopen it, for discussion, if you agree it should be done (or just discussed). I can also go with the pros and cons discussed here. I think we could go with less radical, at least first, then maybe later full no-op and changed name,
@Keno the compiler to make sure that "obvious" patterns like
I meant for A::Array
of course. Mostly people put inbounds on things to make sure that Array
works well, so if @inbounds
was not required for the ::Array
case, I think people would put it in generic code less.
A name won't change that
I think it might. It would for me, at least. If a function has
unsafe
in its name, it makes me much more paranoid and careful about using it.
I agree, adopting the prefix unsafe
for operations that, if misused, could corrupt data or crash Julia has been a useful one.
However, in order for this not to cause a widespread performance hit, it would be necessary to make sure the compiler can eliminate bounds checks on its own most of the time.
On it's own, that wouldn't be possible for code like:
for j in axes(A,2), i in axes(A,1)
foo(A[i,j], B[i,j])
end
but we could encourage people to write
axes(A) == axes(B) || throw("Axis mismatch!")
for j in axes(A,2), i in axes(A,1)
foo(A[i,j], B[i,j])
end
As this would give all the information the compiler needs to prove bounds checks can be removed from the loop. We'd also of course have to make sure the compiler does successfully prove it. This simple example does seem good enough:
function cartesiandot(A,B)
axes(A) == axes(B) || throw("Axis mismatch!")
s = zero(Base.promote_eltype(A,B))
for j = axes(A,2), i = axes(A,1)
@fastmath s += A[i,j] * B[i,j]
end
s
end
A = rand(128,2); B = rand(128,2);
@code_llvm debuginfo=:none cartesiandot(A,B)
This could also be made more convenient by a function like ArrayInterface.indices
:
julia> using ArrayInterface: indices
julia> A = rand(3,4,5); B = rand(5,4,3);
julia> indices((A,B), 2)
Base.Slice(1:1:4)
julia> indices((A,B), (1,3))
Base.Slice(1:1:3)
julia> indices((A,B), (3,1))
Base.Slice(1:1:5)
julia> indices((A,B), (2,2))
Base.Slice(1:1:4)
julia> indices((A,B), (2,3))
ERROR: AssertionError: Unequal Indices: x == 4 != 3 == y
Stacktrace:
[1] unequal_error(x::Int64, y::Int64)
@ ArrayInterface ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:307
[2] check_equal
@ ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:308 [inlined]
[3] _try_static
@ ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:319 [inlined]
[4] _pick_range
@ ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:419 [inlined]
[5] macro expansion
@ ~/.julia/packages/Static/x0MGi/src/tuples.jl:185 [inlined]
[6] reduce_tup
@ ~/.julia/packages/Static/x0MGi/src/tuples.jl:185 [inlined]
[7] indices(x::Tuple{Array{Float64, 3}, Array{Float64, 3}}, dim::Tuple{Int64, Int64})
@ ArrayInterface ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:473
[8] top-level scope
@ REPL[12]:1
Might be worth encouraging people to take this approach instead of @inbounds
for SIMD.
but we could encourage people to write
axes(A) == axes(B) || throw("Axis mismatch!") for j in axes(A,2), i in axes(A,1) foo(A[i,j], B[i,j]) end
As this would give all the information the compiler needs to prove bounds checks can be removed from the loop.
I can get the idea, but it's not really obvious for users like me to know that compiler can magically understand the axes check and eliminate the unnecessary bounds check inside the loop -- in some places, the compiler is always smarter than I think. Thus we might need to introduce a tool such as
@code_warn_boundscheck f(x) # something like @code_warntype
to visually find out what bounds check is not optimized away. Reading @code_llvm
output is such a high requirement for common Julia users; without this, adding an explicit @inbounds
annotation seems the most obvious and reliable way to tweak the performance.
How about tooling-based approaches (i.e., in linters or such)? I think it's too breaking to take away or rename @inbounds
currently, but that doesn't mean one couldn't add "are you sure you know what you're doing" warnings to editors / linters
Sorry if this is inappropriate, but just wanted to say that current name and documentation seem perfectly clear to me. I just started using julia coming form C++ and python. The implications from the documentation all made perfect sense to me. To me, it doesn't warrant a name change.
coming form C++
While not trying not be dismissive, a background on C++ seems to set that you know about memory corruption and bad access to memory. It's every day bread there. But people coming from Python, Matlab, Mathematica, etc., may have not encountered that problem, bound checks happen always.
And if you don't read documentation or help thoroughly, seeing post in discourse, slack, Zulip or other places, make it look like an easy way to make your code faster in some instances. Here I don't blame on anyone, just that I think this scenario happens.
My original point is that Base
already makes a distinction on unsafe functions, so my rationale is that this should not be treated differently, even if it can be turned off.
But I do get that it is a big change, and if it is not done, I get the why. :slightly_smiling_face:
axes(A) == axes(B) || throw("Axis mismatch!") for j in axes(A,2), i in axes(A,1) foo(A[i,j], B[i,j]) end
Isn't that exactly the same pattern as
for idx in eachindex(A, B)
foo(A[idx], B[idx])
end
would do? eachindex
already throws a DimensionMismatch
if the axes
don't agree. Of course this doesn't work if the iteration order is not the same or not all indices would be equal.
We could also make more use of checkbounds
in user code, instead of only in getindex
. The pattern
idxs_j = ..
idxs_i = ..
if idxs_j isa AbstractRange
checkbounds(Bool, A, idxs_j) || throw(..)
checkbounds(Bool, C, idxs_j) || throw(..)
end
if idxs_i isa AbstractRange
checkbounds(Bool, B, idxs_i) || throw(..)
end
@inbounds for j in idxs_j, i in idxs_i
# index with j into A, C..
# index with i into B
end
seems to me like something an improved macro could expand to. If the "traditional" @inbounds
was already correct, the checkbounds
should be negligible while still allowing the compiler to elide boundscheck in the hotloops itself. It would of course be limited to things checkbounds
actually already understands, like ranges. The current nesting behavior of @propagate_inbounds
would probably require some nuance here as well, so that these are not generated when we're already in a nested @safe_inbounds
block.
would do?
eachindex
already throws aDimensionMismatch
if theaxes
don't agree. Of course this doesn't work if the iteration order is not the same or not all indices would be equal.
Yes. I should've gave an example where we aren't lining up the axis, or have a reason to do cartesian indexing.
I agree a @checked_inbbounds
macro that expands to the above would be a good idea.
Of course, it needs a better name -- it should have a nicer/more convenient name than @inbounds
to indicate it is preferred.
Is there a good way to test how unneeded @inbounds
is? You can force checks nowhere done, but I just want to disable the macro globally and get whatever the compiler comes up with in each case. Not same as any of:
--check-bounds={yes|no|auto}
It can't be as simple as roughly:
julia> import Base.@inbounds
julia> macro inbounds() end
@inbounds (macro with 2 methods)
Where is it defined, and how would I find the definition (@which
didn't work for me, nor search on Github)?
Where is it defined, and how would I find the definition (@which didn't work for me, nor search on Github)?
It's in base/essentials.jl:646
, though you'll see less than you probably want to see. It does some compiler expression magic to signal to the compiler not to emit bounds checks, if I'm not mistaken. You'll probably have to build julia from source to meaningfully change that.
Isn't this issue (kinda) a duplicate of #40360?
This seems like a much more focused and specific discussion but related, of course.
Even if the compiler could figure out what is inbounds given any for-loop on its own, we still want to have this feature. Sometimes it's easier to have multiple functions converge on an internal unsafe_foo
that manually uses @inbounds
. Therefore, I don't see how decreasing the need for manually declaring something as "inbounds" eliminates the need to use it ever. So deprecation seems like a really bad idea to me.
In terms of changing the syntax, I understand the motivation to make syntax reflect safety but I don't think this is like the other methods with the "unsafe" prefix. One might assume that errors would be thrown for accessing unassigned memory without the prefix for unsafe_load
and unsafe_store!
. What do people think @inbounds
does if not declare that something is "in bounds"? If it's not clear what "in bounds" means, how does adding the prefix "unsafe" help?
In terms of changing the syntax, I understand the motivation to make syntax reflect safety but I don't think this is like the other methods with the "unsafe" prefix. One might assume that errors would be thrown for accessing unassigned memory without the prefix for
unsafe_load
andunsafe_store!
. What do people think@inbounds
does if not declare that something is "in bounds"? If it's not clear what "in bounds" means, how does adding the prefix "unsafe" help?
1) Conforms to an established convention for functions that allow arbitrary memory access 2) It is very common for people to ignore risks in order to achieve desired benefits, so refocusing attention on the potential consequences is a useful property. 3) How would it hurt?
If someone finds a path forward for changing syntax here without disrupting developers, then its no biggy if this change happens.
I'm just saying that it might not be entirely accurate or helpful to put this in the same box as other unsafe methods. Macros typically don't have the same precedent for being documented as unsafe. Macros change how code parses so it seems odd to me that people are using this or any macro without having a reasonable idea of what it does. What do people think this means without reading documentation?
It just seems like a lot of trouble to give pause to a handful of people that aren't reading docs.
One option would be to make the current @inbounds a no-op, thereby disabling its many uses throughout the ecosystem, and introduce @unsafe_inbounds that does what the current macro does.
I liked that idea from Karpinski, since it's not breaking, and may even be faster..., but what about with a twist? @unsafe_inbounds
would also be a no-op in packages (because of OffsetArrays.jl), unless array proven to be 1-based? I believe in a package you can take a 1-based view of OffsetArray
s.
Possibly a flag for packages (or global flag) to enable @inbounds
with the same semantics (or original semantics, at least for non-package code?)? I.e. after they've been reviews to be save could be enabled. But if you do, then you might as well rename in the code?
Reading Discussion on "Why I no longer recommend Julia", I came to the notion that
@inbounds
have the ability to corrupt memory or access unassign memory. As with the rest ofBase
I think it should be changed to@unsafe_inbounds
.To avoid breakage, a deprecation warning would be thrown, suggesting changing it to the new name. I can make the PR, but wanted to hear, what do you think?