Closed ivarne closed 8 years ago
Yep, for me this is a priority with 0.4. My presumption is that the right way forward is to add the Expr(:withmeta, block, metadata)
functionality https://github.com/JuliaLang/julia/pull/3796#issuecomment-21437111, which would provide a general mechanism for annotating blocks of code for the compiler.
I'd recommend we rename the current Expr(:boundscheck, false)
to Expr(:checkbounds, false)
, because it's an imperative statement. Then we can use
@boundscheck 1 <= i <= length(a) || throw(BoundsError())
as a declarative statement.
I'm wondering if having both of those combined defined might be a little confusing...?
Whoops! Combined => defined
Users won't see both. Expr(:boundscheck, false)
is what's generated by @inbounds
now. I'm only thinking in terms of internal cleanliness.
Okay, thanks for the clarification. Cheers!
Another strategy would be:
unsafe_getindex
and unsafe_setindex!
to call the arrayref
/arrayset
intrinsics without a bounds check@inbounds
macro just rewrite getindex
/setindex!
to unsafe_getindex
/unsafe_setindex!
, rather than hooking into the codegen as it does nowThis would be extensible, and also stop @inbounds
from affecting inlined expressions, which would fix the segfault case in https://github.com/JuliaLang/julia/issues/7427#issuecomment-47274984. There's probably some reason we're not already doing this, though.
That seems like an interesting strategy.
That does sound like a good approach. The existing design comes from (1) not wanting to add names, (2) considering the behavior with inlining a feature. The idea was to automatically catch getindex
calls that just wrap other getindex
calls.
To really break this into orthogonal components, you'd define unsafe_getindex
, and checkbounds
, and have a single getindex
that calls both. unsafe_getindex
would generally be implemented by making other unsafe_getindex
calls. Technically I think this design is good, but I also don't like the idea of lots of ugly library code that has to call unsafe_getindex
everywhere instead of using []
, and that you can't define getindex
directly for new types.
Yeah, I too worry that it's a bit ugly.
You also may want to create a type that can't have its bounds-checks disabled. For example, there are times where you really don't want to check bounds upon construction (see examples in #4044). In such cases, you might want to implement a getindex
for the type that is never unsafe. I guess you could still call it unsafe_getindex
even if it's not true. But having a getindex
that doesn't change its behavior under @inbounds
seems cleaner.
My original thought was that we'd define a unsafe_getindex(x, i...) = getindex(x, i...)
fallback, so you could define getindex
directly for new types if you don't care about bounds check removal, but to get @inbounds
to do anything you'd have to opt into bounds check removal by defining unsafe_getindex
. This may be a good or bad thing compared to the current @inbounds
design depending on whether you privilege DRY or being able to use @inbounds
in generic code without concern of segfaulting if someone passes you a type with a bad getindex
method.
The orthogonal design is clearly less verbose. Ideally if you define getindex
, it would be called for both x[i]
and @inbounds x[i]
, or if you define unsafe_getindex
, x[i]
would call checkbounds(x, i); unsafe_getindex(x, i)
and @inbounds x[i]
would call unsafe_getindex(x, i)
. But that seems to imply fallbacks for both methods that call the other, which would lead to a stack overflow if neither are defined.
As far as ugliness of library code, if @inbounds
is implemented as described, you could write:
unsafe_getindex(x::MyType, i) = @inbounds x.a[i]
which doesn't seem so ugly.
Yep, fair enough. I think this could work.
Perhaps the best argument for the other approach is that it's part of a whole new suite of capabilities that stem from being able to talk directly to the compiler. @inline
, @please_run_LLVM's_SLPvectorizer_on_this_block
, etc. But I think your approach is more selective in how it works with inlining, so it warrants serious consideration. As you say, I guess the fundamental issue is what we want to have happen: should @inbounds
effectively propagate across layers of inlining? I'm not convinced it should.
Returning to this issue, I think there are cases where we don't want @inbounds
to propagate across inlining, and the current behavior is a bit unsafe. For example, in reduce.jl
, we have:
@inbounds fx1 = evaluate(f, A[ifirst])
@inbounds fx2 = evaluate(f, A[ifirst+=1])
This is only okay because at present we can't inline function-valued arguments, so we can never inline evaluate(f, x)
. If evaluate(f, x)
gets inlined and contains a getindex
operation, we don't want to elide the bounds checks, because the user didn't ask for that. This code should really be written as:
@inbounds v1 = A[ifirst]
fx1 = evaluate(f, v1)
@inbounds v2 = A[ifirst+=1]
fx2 = evaluate(f, v2)
which would be safe, but is awkward. In general the effects of @inbounds
seem difficult to reason about since they depend on what the compiler chooses to inline. There is also a tendency to put @inbounds
around blocks of code in library functions. This is similarly unsafe unless the code only calls functions that we know don't involve any unsafe getindex
operations, which can never really be the case for functions that accept user-defined types. With my proposal, @inbounds
would really just claim that all indexing in the block is within the bounds of the respective arrays, and the only way adding @inbounds
can cause code to segfault is if there is a bug in the code wrapped in @inbounds
or a bug in unsafe_getindex
. If I've convinced you, I may take a stab at implementing this soon :smile:.
I've had the same thoughts. It would be a bit of a pity to not have a way to @inbounds
"all the way down," but I agree that we probably don't want that behavior casually, and the current hybrid is unfortunate. I have wondered whether a reasonable solution might be to cause @inbounds
to propagate all the way down, but for types like in #4044 define getindex
with bounds-checking hardwired (no way to disable).
Another syntax that this should ideally be able to support is iteration loops like:
@inbounds for elt in A
…
end
This is a tough case because we've generally talked about @inbounds
not propagating through methods, but here we want it to affect the getindex call within next
. Ideally, I think next
could be written in such a way to opt-in to @inbounds
propagation:
function next(itr, state)
@boundscheck checkbounds(itr,state)
@inbounds x = itr[state]
return (x, state+1)
end
This is effectively https://github.com/JuliaLang/julia/issues/10614#issuecomment-85983252 with per-method compilation settings… and unfortunately #8227 would not be able to support this sort of thing.
That's a good point. I think the Unsafe/Unchecked module proposal from #8227 could do this:
Base.next(itr, state) = (itr[state], state+1)
Base.Unchecked.next(itr, state) = @inbounds (itr[state], state+1)
We could have a @propagateinbounds
macro that automatically defines the Unsafe version.
If @inbounds
only has an effect for types that opt in, it seems we could safely lower for elt in A
to call @inbounds next(...)
instead of next(...)
and eliminate the need for @inbounds
in the code above. If next
and done
are correct, for elt in A
can't go out of bounds.
I'm still very hesitant about having separate functions (be they unsafe_*
or Unchecked.*
). Would it be possible for users to add their own functions to their own Unchecked module? Otherwise, I don't like that some Base functions are deemed special… and that list keeps getting longer: getindex
, setindex!
, next
, sub2ind
, and more (maybe even arithmetic?).
It also just seems like having two functions is asking for trouble. It's the kind of thing that would drive me batty during debugging if one happened to have an error while the other didn't. I know the typical way to write these functions will be f(…) = (checkbounds(…); Unchecked.f(…))
, but it feels backwards to me: I think the re-direction should occur around checkbounds
and not the code for the method itself.
I think you are right that two generic functions is not really a viable solution if both getindex
and its unchecked cousin can be defined for abstract types. We'd end up with cases where we'd call a less specialized unchecked getindex
where we should be calling the more specialized ordinary getindex
.
The problem is that, at the codegen level, either we make @inbounds
only apply to inlined functions (which is fine if the code is correct, but makes things less predictable), or we need to compile and choose between two functions at the native code level, which would potentially introduce additional complexity.
I'm left wondering how hard it would be to implement range analysis and automatic bounds check elimination to the point where we can get good performance in the majority of the cases where it matters, and sidestep this whole issue.
On Wed, Apr 8, 2015 at 6:00 PM, Simon Kornblith notifications@github.com wrote:
I'm left wondering how hard it would be to implement range analysis and automatic bounds check elimination to the point where we can get good performance in the majority of the cases where it matters, and sidestep this whole issue.
You'd be surprised at how quick the analysis would have to give up. Given that arrays are mutable, as soon as a reference to the array escapes anywhere we don't have any guarantees on its size anymore without resorting to heavier tools (interprocedural, alias, ...). An easier project IMO would be to hoist the bounds check out of loops, then you only have to ensure that the array's dimension don't get mutated inside the loop (or only increasing). I don't think that eliminating the checks is really useful once they are out of the hot path since those are quite cheap.
Hoisting bounds checks already works fairly well… just so long as everything inlines properly ( (Not a bounds check hoist; see subsequent comments)checkbounds
doesn't inline its checks currently, whereas I believe that arrayref
does, cf. 700a69ba357bf138502bf105a46c771bcb1b70a8).
Do you mean that LLVM should do it ? Even in this simple example with constant bounds and no side effects it doesn't seem to (should be trunk LLVM) :
julia> f(v::Vector{Int}) = (for i=1:100; v[i]; end)
f (generic function with 1 method)
julia> code_llvm(f, (Vector{Int},))
define void @julia_f_44024(%jl_value_t*) {
top:
%1 = getelementptr inbounds %jl_value_t, %jl_value_t* %0, i64 1
%2 = bitcast %jl_value_t* %1 to i64*
%3 = load i64, i64* %2, align 8
br label %L
L: ; preds = %idxend, %top
%"#s1.0" = phi i64 [ 1, %top ], [ %7, %idxend ]
%4 = add nsw i64 %"#s1.0", -1
%5 = icmp ult i64 %4, %3
br i1 %5, label %idxend, label %oob
oob: ; preds = %L
%6 = alloca i64, align 8
store i64 %"#s1.0", i64* %6, align 8
call void @jl_bounds_error_ints(%jl_value_t* %0, i64* %6, i64 1)
unreachable
idxend: ; preds = %L
%7 = add nuw nsw i64 %"#s1.0", 1
%8 = icmp eq i64 %7, 101
br i1 %8, label %L3, label %L
L3: ; preds = %idxend
ret void
}
On Wed, Apr 8, 2015 at 6:21 PM, Matt Bauman notifications@github.com wrote:
Hoisting bounds checks already works fairly well… just so long as everything inlines properly (checkbounds doesn't inline its checks currently).
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/7799#issuecomment-90964872.
Yes, that was my assumption. I'm still using 3.3, but maybe I wasn't measuring what I thought I was measuring after making that change. Edit: ah, yes, I think I was simply seeing the time for the extra function call (to checkbounds
), not a loop hoist.
LLVM evidently has an experimental inductive range check elimination pass now, which might do what we want.
Also, I'm not sure we really need to worry about handling loops where arrays escape. Bounds checks should be pretty cheap by comparison to function calls. The biggest pain points are cases where they prevent vectorization. Naively hoisting bounds checks out of loops doesn't seem legal in cases where memory is modified inside the loop. To get the right state when a bounds check fails, we'd have to emit two code paths, one for the case where the hoisted bounds check is satisfied and one for the case where it isn't. But inductive range check elimination sounds quite promising.
Alright, I'm convinced. And it seems like we're converging upon a consensus locally. Here's the proposal:
getindex(x::Any, I...) = (checkbounds(x, I...); unsafe_getindex(x, I...)
. Fancy array indexing fallbacks all happen on the unsafe_getindex
methods.unsafe_getindex
. They can also define getindex
if they want something different than the above, but then they're on their own to ensure that their indexing methods are consistent and safe. This will have a nice home in the indexing section of the new manual chapter at #11794.AbstractArray
and other indexable concrete types in base.Base.@inbounds
macro unhygienically replaces getindex
with Base's unsafe_getindex
function.All the above also applies to setindex!
.
+1 to that proposal
Yep, :+1:
This is great, but I'm concerned about one question: Which are the failure modes where the user unwittingly creates a type with broken bounds checking, such that a normal getindex
call can corrupt memory instead of giving a bounds error?
If the expected way to provide indexing for your type is to overload unsafe_getindex
, it seems to me that a lot more cases will be eligible for this kind of bugs, not just ones where the author is explicitly concerned about indexing performance.
I think we should be a bit paranoid about these things, to make sure that Julia is a safe language by default.
Also, do we have a general definition of what @inbounds
asserts? For Array
it is obvious, but if it should be used in generic code? For an AbstractArray
, is the assertion that each index is between 0 and the size of the corresponding dimension?
If the assertion is that the user has checked the bounds already (and likely not using checkbounds
, then I'm not sure there's a point to being able to overload checkbounds
at all.
The advantage of this proposal is that it's composable: checking bounds is a conceptually separate step from retrieving/setting data, and consequently it makes sense to separate the two pieces. The default getindex
will catch 95% of cases. Likewise, if the implementation's access of memory is via B[indx]
for some B
internal to the type, then julia --checkbounds=yes
will catch problems even if the author sprinkles @inbounds
everywhere. Since one can use direct pointer arithmetic in Julia, it will always be possible to write code that can segfault, but I don't think that's a problem---anyone who uses such shenanigans is taking matters into their own hands.
And yes, checkbounds should be checking that the index along dimension d
is between 1 and size(A,d)
. But there are cases where this may not be sufficient. An example might be if you want to construct something like a SubArray that does not use bounds-checking upon construction, but is careful to check whenever indexes are used---you'd want to check both the index and the index-to-the-index.
@toivoh At least the way I'm thinking about this, there's nothing special about unsafe_getindex
and any array indexing within it is checked by default, so a bug in your code will not make Julia segfault unless you yourself used @inbounds
. In terms of safety, I think this is a strict improvement on the current behavior.
Another case where you might want to overload checkbounds
is the case where you have a subtype of AbstractArray that implements the AbstractArray interface and then something else, e.g. AxisArrays.
I would expect SubArray
to check bounds on construction (unless they are constructed in a @inbounds
block). That way it is obvious that an @inbounds
declaration on indexing of a SubArray
only checks bounds on the "outer" array view.
I have trouble seeing when you would want to override checkbounds
. Usually I see @inbounds
used as an assertion that i1 in size(A,1) && i2 in size(A,2), ...
. Is that wrong insufficient sometims? Wouldn't we need a different macro to dissable other sanity checks?
We seriously need to make Traits part of the language (#5). They are starting to appear everywhere on base, and this would make them much easier to work with.
@ivarne, SubArrays do (now) check on construction. But users may want to construct a similar type that doesn't check upon construction (see #4044), and that's what I was using as an example motivating the need to override checkbounds
.
@timholy I'd say such objects doesn't follow (what I expect to be) the @inbounds
interface contract (i in 1:size(A, 1)
), so you'd need a different approach to dissable the safety checks for such objects.
When I apply @inbounds
to some piece of code, I see that as a promise that I will only access valid indexes. Previously I have assumed that meant i in 1:size(A, 1)
, and if we make that logic more complicated, @inbounds
will be much harder to use safely.
I don't understand your concern; with this proposal, you can have your cake and eat it too. The point is that users who want to do fancy things in their own code trees can add extra safety checks to checkbounds
, depending on the specific AbstractArray
type. But none of the types currently in Base will need such shenanigans.
Exactly, the whole point behind this idea is that you almost always want to check bounds before doing indexing. And checking bounds is almost always asserting that each i
or each element of the array I
is in 1:size(A, d)
(or trailingsize
). So let's just do that all by default.
The bonuses to this proposal are that it breaks these things down into orthogonal components:
getindex
if you want to do something completely different, but you must be aware that library code may transform an indexing expression into unsafe_getindex
.checkbounds
if you simply want to alter how bounds are checked when indexing into your custom array type. You're right - Julia has long held the assumption that indices are Int
, start at 1
, and are listed in order. But I think it'd be really powerful (and not too much extra work) to allow custom array types to break those conventions. E.g., Fortran allows specifying the start index, or AxisArrays.jl
will allow you to specify named dimensions in any order.in_bounds
if you have a custom array type you'd like to support indexing with and have a different or more efficient way of checking bounds for itAs I was thinking about this last night, I realized that we'll have to make this scheme support Cartesian indices, which may be tough to do without duplicating functionality (and work). Edit: actually, it's not really any worse than the status quo, I think — it just requires a little extra indirection. Mixed cartesian/integer indexing breaks the assumption that each index corresponds to exactly one dimension… which means that the end
keyword lowers incorrectly. But that's a whole 'nother can of worms.
The trouble is that if you have an array type that break the i in 1:size(A,1)
, most (or all) the existing code that uses @inbounds
is wrong, and will cause segfaults (or other undefined behaviour).
Can you give an example on a correct piece of generic code that uses inbounds and works on both Julia (1 based) and C (0 based) arrays (without resorting to eachindex
)?
Right now it would cause segfaults. With the proposed scheme, it would only cause a segfault if unsafe_getindex
itself uses @inbounds
. Otherwise it will just throw BoundsErrors even if used from code that uses @inbounds
.
The easiest solution is to say that an array with 0-based indexing should not subtype AbstractArray. This seems pretty reasonable to me, since I suspect there's not much that accepts AbstractArrays but not ordinary iterators that will actually work with 0-based indexing.
Ok, but this seems like a discussion for another day. We certainly don't need to document the fact that checkbounds
happens to be written in an orthogonal manner that would allow specialization without tons of ambiguity warnings or encourage its overloading. Perhaps I named #11895 wrong: the point there is more about giving in_bounds
a better name and allowing it to be overloaded, which is very necessary for efficient indexing by nullable arrays. The checkbounds
changes simply fall out of that.
Ah, right, and I totally forgot - we do need to encourage the overloading of checkbounds
for non-abstractarray indexables.
I still have some serious doubts about splitting getindex into two functions. It's quite ugly to have to tell people to define unsafe_getindex
.
Here's a possible alternative. First, we mark code that performs bounds checks with @boundscheck( ... )
. All code marked this way is subject to removal by @inbounds
. At that point, we have all the needed metadata in the code and the rest is up to compiler optimizations, where this belongs.
+1 that was something like what I was thinking of. What's a little weird though is that you seem to need to have machinery to compile two different versions of functions that have @boundscheck
in them – it's sort of implicitly two different methods.
Yes, please! That was definitely my first choice (https://github.com/JuliaLang/julia/pull/10525#issuecomment-82407089), but as I said there, "that just punts the complexity up the chain to a system I don't know and only a few could implement."
This proposal was my compromise that was implementable by peons. :)
As a first cut, we might try having @inbounds
only affect code up to 1 level of inlining. Eventually we might indeed need to have the compiler generate different versions of methods. But that's way better than shoving this in people's faces. All the unsafe_
stuff can be handled internally.
I haven't read all of the discussion on this, but is it thought to be important to remove bounds checks even if getindex
isn't inlined?
Also, this needn't tax method tables or dispatch. A benefit of leaving this to the compiler is that it can do any dirty trick it wants. We will eventually have some kind of __secret__
submodule of every module, where the compiler can stash things it generates. It could handle this by renaming, basically just inserting the unsafe_
prefix itself, and putting those definitions in __secret__
(a good name for this is sought).
The consensus seems to be in strong favor of decoupling @inbounds
from inlining. The issue isn't just if it doesn't inline, but also if too much inlines. That means that a "safe" base method effectively sticks @inbounds
annotations all throughout user code (which may have been expecting bounds checking). None of us like that behavior.
The big thing I care about, for correctness sake, is that @inbounds doesn't act like a dynamic scope, as it apparently does now. If you can also do some magic at some point in the future to cut down the explicit @inbounds I've had to put in for performance sake, that would be delightful!
To be safer, @inbounds
and @boundscheck
could specify an array, and then we'd have to match them up. The check is only removed if the compiler can prove the two macros are talking about the same array.
As I said, we could also limit it to 1 level of inlining. That's a bit arbitrary, but seems pretty safe to me.
For user written short functions that have a huge overhead from bounds checking, it would be nice if it was possible to hook into the
boundscheck
system with@inbounds
in order to let the user be able to disable manual bounds checking inside the function.This might be beneficial for
SubString
,SubArray
, and many wrapping types that have to check bounds before forwarding to the internal storage.One suggestion is to have a
@bounds
macro that you can wrap the optional checking code in, but that seems somewhat brittle.