Closed jakebolewski closed 9 years ago
In more detail, we call to_index
in operators.jl on "non obvious" index types, and it does things like convert to Int.
I've never really liked this. This is a case where the types can accurately reflect what is actually supported (namely, integers). I wonder how useful it is in practice to be able to index with integer-valued floats, and get an InexactError for non-integer values, which is what we do now.
cc ApproxFun.jl authors @dlfivefifty @ajt60gaibb any thoughts?
I can see how this is useful functionality for AbstractArray
subtypes, in that floating point indexing is useful when you want a user defined type that is represented as a function but is "array like". This is used in Interpolations.jl
for instance. If this is useful then we should define a generic AbstractArray
fallback.
I don't think this is useful for general types, such as indexing into a tuple.
However, in the continuous case size
cannot be defined so in that sense these types cannot be thought of as AbstractArrays
Has the ability to overload call
changed the need for non-integer indexing?
I like that you can index textures with floats in GLSL and get interpolated values back. But it's probably not justified to put something like this in Base, as it should rather live in an interpolation package. Which is actually a case against explicitly allowing for floats, as an interpolation package couldn't overwrite it that easily than. I'd think, that in general something went wrong if you index an array with a float. There are a lot of use cases, where float intermediate representations are useful, but shouldn't the caller be forced, to use a function like floor,ceil, round, etc, to convert to an Integer before indexing?! Throwing an inexact error gives me shivers, as it can lead to subtle bugs if the float is just sometimes a little bit off.
In the case of ApproxFun this should be sufficient, as we have moved away from the “functions as vectors” in many other respects.
On 11 Feb 2015, at 7:03 am, Andreas Noack notifications@github.com wrote:
Has the ability to overload call changed the need for non-integer indexing?
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/10154#issuecomment-73772830.
I agree it's typically better to handle a float index by specifying floor, ceil, etc. at the call site. Much clearer what's going on. Types that do interpolation are clearly a separate case. It's totally fine if some AbstractArray subtypes support float indexing and others don't. #10064 is about specifying the minimal core interface.
Is there support for deprecating this behavior for AbstractArray
's and other datatypes (ex. tuples) in base?
@jakebolewski, we could always just deprecate getindex(a, i::FloatingPoint...)
without specifying the type of a
, no?
That seems to be a reasonable approach.
If I may: I think this is a bad idea. In many situations, it is very convenient to not worry about types. It is one of the reasons why switching to Julia was easy for me. I started a discussion at Google groups
Can you give good examples, where this actually hurts?
I cried a little, when convert(Int, FloatingPoint)
was starting to throw InexactError()
errors.
But it actually made my code quite a bit safer and wasn't that hard to ship around.
I had the same feeling as @cortner, but I failed to find good examples of where it's useful. I'm sure there are valid cases. (And this feature is much less dangerous than automatic rounding conversion from float to int, there is no loss of information here.)
This change revealed several type in-safeties in my code, where what started as an int became a float: k/2 instead of div(k,2). So I'm in favour.
The deprecating of int() at the same time is a bit annoying, but given that it's ambiguous whether it means integer part or round, I'm ok with that.
Sent from my iPad
On 16 Mar 2015, at 4:00 am, Milan Bouchet-Valat notifications@github.com wrote:
I had the same feeling as @cortner, but I failed to find good examples of where it's useful. I'm sure there are valid cases. (And this feature is much less dangerous than automatic rounding conversion from float to int, there is no loss of information here.)
— Reply to this email directly or view it on GitHub.
Only time I've used this is to select percentiles lazily, e.g.
n = 927
x = rand(n)
sort!(x)
@show x[n/4]
@show x[3n/4]
but the "cure" is div
. I think the real problem will be people not knowing about div
.
Well if folks like my proposal to refactor getindex on AbstractArrays (#10525) then one of my reasons for liking this change will go away. As Tim noted, Real
's placement in the type hierarchy makes extending getindex particularly tricky and annoying (https://github.com/JuliaLang/julia/pull/10458#issuecomment-77957672). If we only have one getindex method defined for AbstractArrays (without specializing on the indices), then that concern is no longer relevant.
I still think this is a sensible restriction to make, but I'd be open to reconsidering it now.
As I had posted on julia-users, an edge case shows up when indexing with floats in regions where epsilon > 1:
julia> (1:10^17)[1e16:(1e16+5)] #Not the same as indexing with (10^16:10^16+5)
5-element Array{Int64,1}:
10000000000000000
10000000000000000
10000000000000002
10000000000000004
10000000000000004
I think it should be easy enough to throw an exception when a conversion is not "safe".
Re use cases: it occurs e.g. when I compute an index using round
, but more generally I've stumbled over it a few times. Always easy to fix, but annoying.
@dlfivefifty : I am not convinced about the type instability. When there is a performance issue then a programmer who actually cares will be able to find it. Julia provides very nice tools to find type instability.
Probably wasn't much performance impact, but it did bother me that something I called k was becoming a float...it's very unlikely I would have found this otherwise. (More unlikely that it would ever make a difference however..)
My feeling is 99% of cases where get index is called with float are unintentional.
(I do kinda think down the line there should be a "relaxed" mode where Julia is not so picky though..)
Sent from my iPad
On 16 Mar 2015, at 10:46 pm, Christoph Ortner notifications@github.com wrote:
@dlfivefifty : I am not convinced about the type instability. When there is a performance issue then a programmer who actually cares will be able to find it. Julia provides very nice tools to find type instability.
— Reply to this email directly or view it on GitHub.
I think it should be easy enough to throw an exception when a conversion is not "safe".
@cortner If the index doesn't round to itself, sure. However, I don't think it is so easy to tell in the example I gave whether or not that command was intentional or a mistake. Maybe someone forgot to round the entries of the index set. But maybe someone genuinely intended to do precisely that (since you can index into an array with an arbitrary index set). Trying to second-guess a user's intent is a dangerous rabbit hole to go down.
AFAIK only Matlab/Octave and Mathematica support indexing into an array with a float. Maple, C, Python, Fortran do not.
For dense arrays, this is not such a big concern since indexing into a Float64
vector with a length of maxintfloat()
requires 64 petabytes of data. For sparse data structures, of course, this can actually happen, although I suspect it's still rare to have integral dimensions that are that large.
Once we have COO arrays, I think you should expect to see a lot of integral dimensions outside of the 2^52 range.
But I still argue that one can simply throw an exception when the conversion is unsafe.
@cortner and my point is that you cannot tell in my example whether it is "unsafe".
@jiahao , I think your example shows that it is unsafe to use float-indexing for numbers > 1/eps.
Also, this is not just an academic counterexample when it comes to Float32
s. maxintfloat(Float32)
is 2^24, and there are already quite a few data sets of this size. (See the Network Repository for some examples.)
For Float32, float-indexing is unsafe for indices > 1/eps(Float32)
@cortner Julia allows indexing with an arbitrary index set. x[ [10^16, 10^16+1, 10^16+2] ]
is a valid indexing operation. It is tempting but incorrect to think that x[ [1e16:(1e16+2.0)] ]
might do the same thing. Instead, it does the same thing as x[ [10^16, 10^16, 10^16+2] ]
in the default round-to-even rounding mode. But this is also a valid indexing operation. To claim that the second indexing operation is "unsafe" is to make a subjective judgment call on user intent which uses valid and unambiguous (if not necessarily intuitive) semantics.
Similarly, consider x[ [10^16, 10^16+1] ] != x[ [1e16:(1e16+1.0)] ] == x[ [1e16] ]
.
I understand all the arguments (to some extent anyways), all I keep saying is that you are taking the point of view of a computer scientist and not that of a numerical analyst / user. One of the nice things about Julia is that, if I don't want to, then I don't need to worry about types. But suddenly now I do.
In all honesty, it is a relatively minor issue, but many such minor issues taken together will make Julia a less attractive language for teaching, for casual programmers, and maybe even for programmers in general. I plan to collect a list of examples along the same lines and will post them at some point.
If we lived in a world where all floating point numbers had epsilon <= 1 I would agree that floating point indexing is much less problematic conceptually. However, we don't live in that world, so we have the problem of balancing the "convenience" of writing A[1.0]
against the "problem" of indexing with large floats. All I'm cautioning against is any concessions to convenience that come at the expense of overall semantic consistency.
@cortner you're slowly convincing me, as it now reminds me of [a => b]
becoming Dict(a=>b)
, which also made me quite sad...
Maybe move these kind of checks into Lint.jl, or something comparable?
So someone who wants safety, can still get it, and someone who just wants to play around doesn't have to think much about types and conversions.
This doesn't address @jiahao's concern of semantic inconsistency though....
@jiahao If floating point indexing is forbidden, then it's obviously not considered as "valid". Thus, allowing a subset of those operations gives more freedom to users, so I don't see the problem about it being a "judgement call".
FWIW, also add R to the list of software that allow indexing with floats.
I think it all depends on how you look at it. There have been several people who have posted about how they've found bugs/type issues in their code due to this change and are grateful for this change. In a teaching situation, we have right there some great examples to help explain why this certain restriction was chosen; more so as an overall design decision than a "correctness" decision and here's why.
I also don't think it's too terrible to define your own float getindex methods if you have your own use case for them. There may be legitimate use cases and if it really makes sense for a package, they can define their own and use them or as a user, put it in your .juliarc file.
On Mon, Mar 16, 2015 at 9:40 AM, Christoph Ortner notifications@github.com wrote:
I understand all the arguments (to some extent anyways), all I keep saying is that you are taking the point of view of a computer scientist and not that of a numerical analyst / user. One of the nice things about Julia is that, if I don't want to, then I don't need to worry about types. But suddenly now I do.
In all honesty, it is a relatively minor issue, but many such minor issues taken together will make Julia a less attractive language for teaching, for casual programmers, and maybe even for programmers in general. I plan to collect a list of examples along the same lines and will post them at some point.
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/10154#issuecomment-81747535.
@nalimilan I think you may be conflating "forbidden" with "not defined". If we define this method, it should do something reasonable for all valid inputs of that given type. I'm not opposed to something like
if 0 <= floatidx < maxintfloat(typeof(floatidx))
idx = round(Int, floatidx)
return idx==floatidx ? A[idx] : throw(InexactError())
end
The unresolved question in my mind is what should happen for other values. One possibility is to assert that users should never do this and raise an error. Another possibility is to assume that users know what they are doing past this point and allow the indexing. If it is not clear what this method should do then the most conservative choice is to not define it in Base
, since the base library does not have the luxury of ignoring corner cases that user code doesn't care about.
Another good candidate definition for the MatlabCompat.jl package?
@jiahao I think the proposal is to raise an error when indexing above maxintfloat
. People who know what they are doing can still convert manually to an integer beforehand (as currently).
@tkelman But also into RCompat.jl then. I don't really like the idea of encouraging alternative definitions for core functions like getindex(::AbstractArray, ...)
in supplementary packages. This means []
won't have the same meaning depending on the background of the person who wrote the code, and contrary to a convenience function it's not easy to find out where it comes from.
@tkelman That is a possibility, but to be honest I would prefer to not have to rely on a package, which may or may not be updated on time with new versions of the language.
@nalimilan yes, very good point.
@cortner Is your main objection that you are already calling round
(or floor/trunc/ceil/etc) and don't like the verbosity and type-fiddliness of round(Int, …)
?
@mbauman the short answer is yes; it seems small when said like this, but it is a matter of principle. I want to have the choice to not worry about types. And it is not so much myself I am worried about, as I will get used to it, but I think these sort of design decisions will play against Matlab and R users converting to Julia.
I want to have the choice to not worry about types.
I don't really think that's a viable choice, even though it's occasionally been claimed that you don't have to worry about types. Right now, the type inference system really only lets you ignore types if you exclusively work with Int
, Float64
and Array{Float64}
. Even with that caveat, conflating Int
and Float64
poses serious problems for both performance and correctness. (It's not an accident that both Matlab and R made themselves more "user-friendly" by discouraging users from ever using any integers.)
As soon as you work with strings (let alone a custom type), you discover that you need to know a lot about the type system before you can write efficient code in Julia. You need to know which types are abstract, so that you can make sure that you never include those types as fields in a type; you need to know which visually similar values (e.g. 0
vs 0.0
/ "foo" vs "fóo") have distinct types, so that you can avoid writing code that has variables with performance-damaging type uncertainty.
As such, I think it would be unwise for the Julia community to claim that you have a choice to not worry about types. We could only achieve that if we adopted a radically more limited type system and removed integers completely from the language.
Since that won't likely be happening, I think it's a virtue of the debated indexing change that it encourages users to make sure that they never think of integers and floating-point numbers as interchangeable, since they manifestly are not interchangeable in Julia's current state.
I can imagine a new user doing something like
A=[1,2] A[1]=2.4
where you definitely do have to worry about types.
Sent from my iPhone
On 17 Mar 2015, at 5:51 am, John Myles White notifications@github.com wrote:
I want to have the choice to not worry about types.
I don't really think that's a viable choice, even though it's occasionally been claimed that you don't have to worry about types. Right now, the type inference system really only lets you ignore types if you exclusively work with Int, Float64 and Array{Float64}. Even with that caveat, conflating Int and Float64 poses serious problems for both performance and correctness. (It's not an accident that both Matlab and R made themselves more "user-friendly" by discouraging users from ever using any integers.)
As soon as you work with strings (let alone a custom type), you discover that you need to know a lot about the type system before you can write efficient code in Julia. You need to know which types are abstract, so that you can make sure that you never include those types as fields in a type; you need to know which visually similar values (e.g. 0 vs 0.0 / "foo" vs "fóo") have distinct types, so that you can avoid writing code that has variables with performance-damaging type uncertainty.
As such, I think it would be unwise for the Julia community to claim that you have a choice to not worry about types. We could only achieve that if we adopted a radically more limited type system and removed integers completely from the language.
Since that won't likely be happening, I think it's a virtue of the debated indexing change that it encourages users to make sure that they never think of integers and floating-point numbers as interchangeable, since they manifestly are not interchangeable in Julia's current state.
— Reply to this email directly or view it on GitHub.
Yeah, I don't really buy the "you don't have to worry about types" thing – because that will last all of about a day – but I still think that we should in general do conversions automatically when they're safe and obvious. I'm not really convinced that the fact that you may have previously lost precision when computing a floating-point value is the concern of the indexing operation.
@cortner, @StefanKarpinski why should indexing for arrays be special cased here? Do you think that most users define indexing for all subtypes of Real
s for user defined types? Almost no one does. To index consistently across all user defined types you have to explicitly convert to an Int
anyways. Reasoning about Int
s in the system is inescapable.
In fact, indexing by floating point values is not even consistently defined in Base
. For example, the dense case is defined:
julia> A = randn(3,3)
3x3 Array{Float64,2}:
-1.44595 1.07011 2.06128
-0.46673 0.406243 3.70655
-0.564156 0.494642 -0.936865
julia> A[1.0,1.0]
-1.4459545197847976
but for sparse arrays, and much of the AbstractArray
subtypes it is not
julia> SA = sprandn(3,3,1.0)
3x3 sparse matrix with 9 Float64 entries:
[1, 1] = -0.294094
[2, 1] = 0.00830028
[3, 1] = -1.11293
[1, 2] = 0.304284
[2, 2] = -0.786085
[3, 2] = -0.591092
[1, 3] = 0.648532
[2, 3] = 1.72333
[3, 3] = -0.0314884
julia> SA[1.0,1.0]
ERROR: MethodError: `getindex` has no method matching getindex(::Base.SparseMatrix.SparseMatrixCSC{Float64,Int64}, ::Float64, ::Float64)
Closest candidates are:
getindex(::AbstractArray{T,2}, ::Any, ::Any, ::Any, ::Any...)
getindex(::AbstractArray{T,N}, ::Real)
getindex(::Base.SparseMatrix.SparseMatrixCSC{Tv,Ti<:Integer}, ::Integer)
...
julia> Diagonal(A)[1.0,1.0]
ERROR: MethodError: `getindex` has no method matching getindex(::Base.LinAlg.Diagonal{Float64}, ::Float64, ::Float64)
Closest candidates are:
getindex(::AbstractArray{T,2}, ::Any, ::Any, ::Any, ::Any...)
getindex(::AbstractArray{T,N}, ::Real)
getindex(::Base.LinAlg.Diagonal{T}, ::Integer, ::Integer)
...
@mbauman work would help here for subtypes of AbstractArray
, but that still does not address the behavior for user defined types. We could obviously export the conversion and promote it as the way to do indexing but that would be a big change to the Julia code which is currently out in the wild.
To be clear, I'm kind of ambivalent about this. It's annoying how inconsistent this is across all of Base and various packages. I don't think that indexing is an exceptional case – I want a more consistent policy. But to have such a policy, I think we need to have a reason why we draw the line where we do. Currently, it's a very wiggly line.
For me, the bright line is just that indexing shouldn't use floating-point numbers in addition to allowing integers. The rationale I see is that:
For me, floating point indexing is almost entirely lose-lose. It's kind of cute that it works, but it's not cute enough to justify its existence in the absence of some other compelling rule that would imply it ought to exist.
I just want some consistency in the interface as well, but I agree with @johnmyleswhite's points which is why I proposed the change.
Handling
(get/set)index(::AType, ::FloatingPoint....)
is inconsistently implemented forAbstractArray
subtypes in base and in user defined types for many packages. Should we expect this case be handled when defining(get/set)index
behavior for user defined types? This needs to be documented (or a generic fallback defined), related to #10064.