JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.93k stars 5.49k forks source link

deprecate `@inbounds` and change it to `@unsafe_inbounds` #45342

Open Suavesito-Olimpiada opened 2 years ago

Suavesito-Olimpiada commented 2 years ago

Reading Discussion on "Why I no longer recommend Julia", I came to the notion that @inbounds have the ability to corrupt memory or access unassign memory. As with the rest of Base I think it should be changed to @unsafe_inbounds.

To avoid breakage, a deprecation warning would be thrown, suggesting changing it to the new name. I can make the PR, but wanted to hear, what do you think?

oscardssmith commented 2 years ago

deprecating @inbounds would be massively breaking. Unless you are willing to go through and make the change to a few thousand libraries, I think this is a non-starter.

Suavesito-Olimpiada commented 2 years ago

I don't know the complete lexical used here, but by deprecation I meant it being usable yet, but just with a deprecation warning. :sweat_smile:

I would be ready to go through as many packages as I could changing this, as my summer break is really near. :stuck_out_tongue:

DNF2 commented 2 years ago

Is it possible to introduce @unsafe_inbounds, without deprecating @inbounds, but changing the manual and the docstring of @inbounds to say

@inbounds will be removed in a future version. Please use @unsafe_inbounds instead.

Matlab does this sort of thing, at least.

ianatol commented 2 years ago

The docstring for @inbounds already warns as follows, and personally I don't think that changing the name to @unsafe_inbounds will dissuade anyone from a risky usage more than this warning already does.

  │ Warning
  │
  │  Using @inbounds may return incorrect results/crashes/corruption for out-of-bounds indices. The user is
  │  responsible for checking it manually. Only use @inbounds when it is certain from the information locally
  │  available that all accesses are in bounds.
Suavesito-Olimpiada commented 2 years ago

That is true, but one person on Discourse said that didn't know about it being unsafe.

A nice design that Base follows is being explicit on a lot of things, one of them is marking functions as unsafe when they are

I think that design should be followed here. Mostly because of consistency, but also because it could be educational in occasions where people don't know about this being unsafe. It is not so much about dissuading people from using it.

PallHaraldsson commented 2 years ago

TL;DR disagree with "KristofferC added merge me and removed merge me" (the former part).

a) I'm not sure about changing this, then you would have two versions out there, @unsafe_inbounds (clearly unsafe, as in C/C++, just explicit about it) and @inbounds still unsafe, meaning the same but implying to some soething different.

b) And you also have ccall (and @ccall) which keep being unsafe. I think people will just have to learn the these idioms of the language. The user can turn off a) globally.

If some way of tackling OffsetArrays.jl warrants a new macro with different semantics, then I like to hear it (likely my proposal was flawed).

Keno commented 2 years ago

I think the best way forward here is to dramatically reduce our dependence on @inbounds - it really has gotten out of hands. We need to improve the compiler to make sure that "obvious" patterns like:

for i = 1:length(A)
A[i]
end

don't emit bounds checks anymore. LLVM has most of the info that's needed for this - somebody needs to look at why that's not firing.

Once we've done that, it'll be much easier to come up with safer versions of @inbounds, because the use across the ecosystem won't be as sprawling.

Suavesito-Olimpiada commented 2 years ago

the best way forward here is to dramatically reduce our dependence on @inbounds

I think you're right, @inbounds really feels like a hacky way to tell the compiler something that it should already know, at least for some types (Array).

somebody needs to look at why that's not firing

Lamentably, I don't speak compiler. :'V

safer versions of @inbounds

What would a safer version of inbounds looks like?

KristofferC commented 2 years ago

TL;DR disagree with "KristofferC added merge me and removed merge me" (the former part).

Huh? I fat fingered adding the label and then I removed it... It is also an issue, not a PR.

bvdmitri commented 2 years ago

@Keno the compiler to make sure that "obvious" patterns like

Why would you assume it is an "obvious" pattern? It implicitly assumes that indexing starts with 1, but no one tells that A has valid index 1. The whole point of the posted article is that the code you posted is unsafe with removed bounds checking. Can any compiler figure that out automatically?

StefanKarpinski commented 2 years ago

One option would be to make the current @inbounds a no-op, thereby disabling its many uses throughout the ecosystem, and introduce @unsafe_inbounds that does what the current macro does. That would not break anything since ignoring inbounds is always valid and anything that errors when ignoring it was already broken. It would allow the ecosystem to incrementally review uses of the macro and either:

  1. leave them, implicitly saying they are no longer necessary,
  2. delete them, explicitly saying they’re no longer necessary, or
  3. change them to @unsafe_inbounds, explicitly saying they’re still necessary.

However, in order for this not to cause a widespread performance hit, it would be necessary to make sure the compiler can eliminate bounds checks on its own most of the time.

Seelengrab commented 2 years ago

I don't think @inbounds is ever going to go away completely, at the very least not while it's allowed to use any arbitrarily computed integers as indices. Consider this (admittedly contrived) example:

julia> using Base.Iterators

julia> function foo(A, n)
          s = 0
          idxs = 1:2
          checkbounds(Bool, A, idxs) || error("not inbounds")
          for idx in Iterators.filter(==(first(idxs)), flatten(first(repeated(idxs), length(idxs)*n)))
            s += A[idx] == 1
          end
          return s
       end
foo (generic function with 1 method)

All information required for this is right there, but if the compiler can ever figure out that this will always be inbounds, I'd be extremely impressed. At the very least, it would require formalizing the bounds checking interface further and have @inbounds require that (I'd be very much in favor of this!).

As such, renaming or otherwise deprecating the current version feels like trying to cure a symptom instead of getting to the root of the problem: users not respecting the warnings & caveats described in its documentation. A name won't change that, if the macro is otherwise useful to them. Heck, Base.@pure isn't even exported and explicitly says that its for internal use only. Still, people used it and complained when their code was broken due to misuse.

Once we've done that, it'll be much easier to come up with safer versions of @inbounds, because the use across the ecosystem won't be as sprawling.

Well unless people go back through old code and replace the @inbounds with that safer version, I think we're stuck with it. :shrug: We could of course also go through the ecosystem and fix erroneous uses today..

oscardssmith commented 2 years ago

@bvdmitri it's safe for the compiler to add the inbounds for types where it is valid (separate code gets compiled for different types).

DNF2 commented 2 years ago

A name won't change that

I think it might. It would for me, at least. If a function has unsafe in its name, it makes me much more paranoid and careful about using it.

PallHaraldsson commented 2 years ago

@StefanKarpinski:

One option would be to make the current @inbounds a no-op

My less radical proposal was closed (also at the package when I moved it there as followup):

https://github.com/JuliaLang/julia/issues/45353

I request you reopen it, for discussion, if you agree it should be done (or just discussed). I can also go with the pros and cons discussed here. I think we could go with less radical, at least first, then maybe later full no-op and changed name,

Keno commented 2 years ago

@Keno the compiler to make sure that "obvious" patterns like

I meant for A::Array of course. Mostly people put inbounds on things to make sure that Array works well, so if @inbounds was not required for the ::Array case, I think people would put it in generic code less.

BioTurboNick commented 2 years ago

A name won't change that

I think it might. It would for me, at least. If a function has unsafe in its name, it makes me much more paranoid and careful about using it.

I agree, adopting the prefix unsafe for operations that, if misused, could corrupt data or crash Julia has been a useful one.

chriselrod commented 2 years ago

However, in order for this not to cause a widespread performance hit, it would be necessary to make sure the compiler can eliminate bounds checks on its own most of the time.

On it's own, that wouldn't be possible for code like:

for j in axes(A,2), i in axes(A,1)
    foo(A[i,j], B[i,j])
end

but we could encourage people to write

axes(A) == axes(B) || throw("Axis mismatch!")
for j in axes(A,2), i in axes(A,1)
    foo(A[i,j], B[i,j])
end

As this would give all the information the compiler needs to prove bounds checks can be removed from the loop. We'd also of course have to make sure the compiler does successfully prove it. This simple example does seem good enough:

function cartesiandot(A,B)
    axes(A) == axes(B) || throw("Axis mismatch!")
    s = zero(Base.promote_eltype(A,B))
    for j = axes(A,2), i = axes(A,1)
        @fastmath s += A[i,j] * B[i,j]
    end
    s
end
A = rand(128,2); B = rand(128,2);
@code_llvm debuginfo=:none cartesiandot(A,B)

This could also be made more convenient by a function like ArrayInterface.indices:

julia> using ArrayInterface: indices

julia> A = rand(3,4,5); B = rand(5,4,3);

julia> indices((A,B), 2)
Base.Slice(1:1:4)

julia> indices((A,B), (1,3))
Base.Slice(1:1:3)

julia> indices((A,B), (3,1))
Base.Slice(1:1:5)

julia> indices((A,B), (2,2))
Base.Slice(1:1:4)

julia> indices((A,B), (2,3))
ERROR: AssertionError: Unequal Indices: x == 4 != 3 == y
Stacktrace:
 [1] unequal_error(x::Int64, y::Int64)
   @ ArrayInterface ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:307
 [2] check_equal
   @ ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:308 [inlined]
 [3] _try_static
   @ ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:319 [inlined]
 [4] _pick_range
   @ ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:419 [inlined]
 [5] macro expansion
   @ ~/.julia/packages/Static/x0MGi/src/tuples.jl:185 [inlined]
 [6] reduce_tup
   @ ~/.julia/packages/Static/x0MGi/src/tuples.jl:185 [inlined]
 [7] indices(x::Tuple{Array{Float64, 3}, Array{Float64, 3}}, dim::Tuple{Int64, Int64})
   @ ArrayInterface ~/.julia/packages/ArrayInterface/R0AhD/src/ranges.jl:473
 [8] top-level scope
   @ REPL[12]:1

Might be worth encouraging people to take this approach instead of @inbounds for SIMD.

johnnychen94 commented 2 years ago

but we could encourage people to write

axes(A) == axes(B) || throw("Axis mismatch!")
for j in axes(A,2), i in axes(A,1)
    foo(A[i,j], B[i,j])
end

As this would give all the information the compiler needs to prove bounds checks can be removed from the loop.

I can get the idea, but it's not really obvious for users like me to know that compiler can magically understand the axes check and eliminate the unnecessary bounds check inside the loop -- in some places, the compiler is always smarter than I think. Thus we might need to introduce a tool such as

@code_warn_boundscheck f(x) # something like @code_warntype

to visually find out what bounds check is not optimized away. Reading @code_llvm output is such a high requirement for common Julia users; without this, adding an explicit @inbounds annotation seems the most obvious and reliable way to tweak the performance.

brenhinkeller commented 2 years ago

How about tooling-based approaches (i.e., in linters or such)? I think it's too breaking to take away or rename @inbounds currently, but that doesn't mean one couldn't add "are you sure you know what you're doing" warnings to editors / linters

tkuraku commented 2 years ago

Sorry if this is inappropriate, but just wanted to say that current name and documentation seem perfectly clear to me. I just started using julia coming form C++ and python. The implications from the documentation all made perfect sense to me. To me, it doesn't warrant a name change.

Suavesito-Olimpiada commented 2 years ago

coming form C++

While not trying not be dismissive, a background on C++ seems to set that you know about memory corruption and bad access to memory. It's every day bread there. But people coming from Python, Matlab, Mathematica, etc., may have not encountered that problem, bound checks happen always.

And if you don't read documentation or help thoroughly, seeing post in discourse, slack, Zulip or other places, make it look like an easy way to make your code faster in some instances. Here I don't blame on anyone, just that I think this scenario happens.

My original point is that Base already makes a distinction on unsafe functions, so my rationale is that this should not be treated differently, even if it can be turned off.

But I do get that it is a big change, and if it is not done, I get the why. :slightly_smiling_face:

Seelengrab commented 2 years ago
axes(A) == axes(B) || throw("Axis mismatch!")
for j in axes(A,2), i in axes(A,1)
    foo(A[i,j], B[i,j])
end

Isn't that exactly the same pattern as

for idx in eachindex(A, B)
    foo(A[idx], B[idx])
end

would do? eachindex already throws a DimensionMismatch if the axes don't agree. Of course this doesn't work if the iteration order is not the same or not all indices would be equal.

We could also make more use of checkbounds in user code, instead of only in getindex. The pattern

idxs_j = ..
idxs_i = ..
if idxs_j isa AbstractRange
    checkbounds(Bool, A, idxs_j) || throw(..)
    checkbounds(Bool, C, idxs_j) || throw(..)
end
if idxs_i isa AbstractRange 
    checkbounds(Bool, B, idxs_i) || throw(..)
end
@inbounds for j in idxs_j, i in idxs_i
     # index with j into A, C..
     # index with i into B
end

seems to me like something an improved macro could expand to. If the "traditional" @inbounds was already correct, the checkbounds should be negligible while still allowing the compiler to elide boundscheck in the hotloops itself. It would of course be limited to things checkbounds actually already understands, like ranges. The current nesting behavior of @propagate_inbounds would probably require some nuance here as well, so that these are not generated when we're already in a nested @safe_inbounds block.

chriselrod commented 2 years ago

would do? eachindex already throws a DimensionMismatch if the axes don't agree. Of course this doesn't work if the iteration order is not the same or not all indices would be equal.

Yes. I should've gave an example where we aren't lining up the axis, or have a reason to do cartesian indexing.

I agree a @checked_inbbounds macro that expands to the above would be a good idea. Of course, it needs a better name -- it should have a nicer/more convenient name than @inbounds to indicate it is preferred.

PallHaraldsson commented 2 years ago

Is there a good way to test how unneeded @inbounds is? You can force checks nowhere done, but I just want to disable the macro globally and get whatever the compiler comes up with in each case. Not same as any of:

--check-bounds={yes|no|auto}

It can't be as simple as roughly:

julia> import Base.@inbounds

julia> macro inbounds() end
@inbounds (macro with 2 methods)

Where is it defined, and how would I find the definition (@which didn't work for me, nor search on Github)?

Seelengrab commented 2 years ago

Where is it defined, and how would I find the definition (@which didn't work for me, nor search on Github)?

It's in base/essentials.jl:646, though you'll see less than you probably want to see. It does some compiler expression magic to signal to the compiler not to emit bounds checks, if I'm not mistaken. You'll probably have to build julia from source to meaningfully change that.

giordano commented 2 years ago

Isn't this issue (kinda) a duplicate of #40360?

StefanKarpinski commented 2 years ago

This seems like a much more focused and specific discussion but related, of course.

Tokazama commented 2 years ago

Even if the compiler could figure out what is inbounds given any for-loop on its own, we still want to have this feature. Sometimes it's easier to have multiple functions converge on an internal unsafe_foo that manually uses @inbounds. Therefore, I don't see how decreasing the need for manually declaring something as "inbounds" eliminates the need to use it ever. So deprecation seems like a really bad idea to me.

In terms of changing the syntax, I understand the motivation to make syntax reflect safety but I don't think this is like the other methods with the "unsafe" prefix. One might assume that errors would be thrown for accessing unassigned memory without the prefix for unsafe_load and unsafe_store!. What do people think @inbounds does if not declare that something is "in bounds"? If it's not clear what "in bounds" means, how does adding the prefix "unsafe" help?

BioTurboNick commented 2 years ago

In terms of changing the syntax, I understand the motivation to make syntax reflect safety but I don't think this is like the other methods with the "unsafe" prefix. One might assume that errors would be thrown for accessing unassigned memory without the prefix for unsafe_load and unsafe_store!. What do people think @inbounds does if not declare that something is "in bounds"? If it's not clear what "in bounds" means, how does adding the prefix "unsafe" help?

1) Conforms to an established convention for functions that allow arbitrary memory access 2) It is very common for people to ignore risks in order to achieve desired benefits, so refocusing attention on the potential consequences is a useful property. 3) How would it hurt?

Tokazama commented 2 years ago

If someone finds a path forward for changing syntax here without disrupting developers, then its no biggy if this change happens.

I'm just saying that it might not be entirely accurate or helpful to put this in the same box as other unsafe methods. Macros typically don't have the same precedent for being documented as unsafe. Macros change how code parses so it seems odd to me that people are using this or any macro without having a reasonable idea of what it does. What do people think this means without reading documentation?

It just seems like a lot of trouble to give pause to a handful of people that aren't reading docs.

PallHaraldsson commented 1 year ago

One option would be to make the current @inbounds a no-op, thereby disabling its many uses throughout the ecosystem, and introduce @unsafe_inbounds that does what the current macro does.

I liked that idea from Karpinski, since it's not breaking, and may even be faster..., but what about with a twist? @unsafe_inbounds would also be a no-op in packages (because of OffsetArrays.jl), unless array proven to be 1-based? I believe in a package you can take a 1-based view of OffsetArrays.

Possibly a flag for packages (or global flag) to enable @inbounds with the same semantics (or original semantics, at least for non-package code?)? I.e. after they've been reviews to be save could be enabled. But if you do, then you might as well rename in the code?