JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.89k stars 5.49k forks source link

Should we expect floating point indexing to be implemented? #10154

Closed jakebolewski closed 9 years ago

jakebolewski commented 9 years ago

Handling (get/set)index(::AType, ::FloatingPoint....) is inconsistently implemented for AbstractArray subtypes in base and in user defined types for many packages. Should we expect this case be handled when defining (get/set)index behavior for user defined types? This needs to be documented (or a generic fallback defined), related to #10064.

JeffBezanson commented 9 years ago

In more detail, we call to_index in operators.jl on "non obvious" index types, and it does things like convert to Int.

I've never really liked this. This is a case where the types can accurately reflect what is actually supported (namely, integers). I wonder how useful it is in practice to be able to index with integer-valued floats, and get an InexactError for non-integer values, which is what we do now.

jiahao commented 9 years ago

cc ApproxFun.jl authors @dlfivefifty @ajt60gaibb any thoughts?

jakebolewski commented 9 years ago

I can see how this is useful functionality for AbstractArray subtypes, in that floating point indexing is useful when you want a user defined type that is represented as a function but is "array like". This is used in Interpolations.jl for instance. If this is useful then we should define a generic AbstractArray fallback.

I don't think this is useful for general types, such as indexing into a tuple.

jakebolewski commented 9 years ago

However, in the continuous case size cannot be defined so in that sense these types cannot be thought of as AbstractArrays

andreasnoack commented 9 years ago

Has the ability to overload call changed the need for non-integer indexing?

SimonDanisch commented 9 years ago

I like that you can index textures with floats in GLSL and get interpolated values back. But it's probably not justified to put something like this in Base, as it should rather live in an interpolation package. Which is actually a case against explicitly allowing for floats, as an interpolation package couldn't overwrite it that easily than. I'd think, that in general something went wrong if you index an array with a float. There are a lot of use cases, where float intermediate representations are useful, but shouldn't the caller be forced, to use a function like floor,ceil, round, etc, to convert to an Integer before indexing?! Throwing an inexact error gives me shivers, as it can lead to subtle bugs if the float is just sometimes a little bit off.

dlfivefifty commented 9 years ago
In the case of ApproxFun this should be sufficient, as we have moved away from the “functions as vectors” in many other respects.

On 11 Feb 2015, at 7:03 am, Andreas Noack notifications@github.com wrote:

Has the ability to overload call changed the need for non-integer indexing?

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/10154#issuecomment-73772830.

JeffBezanson commented 9 years ago

I agree it's typically better to handle a float index by specifying floor, ceil, etc. at the call site. Much clearer what's going on. Types that do interpolation are clearly a separate case. It's totally fine if some AbstractArray subtypes support float indexing and others don't. #10064 is about specifying the minimal core interface.

jakebolewski commented 9 years ago

Is there support for deprecating this behavior for AbstractArray's and other datatypes (ex. tuples) in base?

stevengj commented 9 years ago

@jakebolewski, we could always just deprecate getindex(a, i::FloatingPoint...) without specifying the type of a, no?

ViralBShah commented 9 years ago

That seems to be a reasonable approach.

cortner commented 9 years ago

If I may: I think this is a bad idea. In many situations, it is very convenient to not worry about types. It is one of the reasons why switching to Julia was easy for me. I started a discussion at Google groups

SimonDanisch commented 9 years ago

Can you give good examples, where this actually hurts? I cried a little, when convert(Int, FloatingPoint) was starting to throw InexactError() errors. But it actually made my code quite a bit safer and wasn't that hard to ship around.

nalimilan commented 9 years ago

I had the same feeling as @cortner, but I failed to find good examples of where it's useful. I'm sure there are valid cases. (And this feature is much less dangerous than automatic rounding conversion from float to int, there is no loss of information here.)

dlfivefifty commented 9 years ago

This change revealed several type in-safeties in my code, where what started as an int became a float: k/2 instead of div(k,2). So I'm in favour.

The deprecating of int() at the same time is a bit annoying, but given that it's ambiguous whether it means integer part or round, I'm ok with that.

Sent from my iPad

On 16 Mar 2015, at 4:00 am, Milan Bouchet-Valat notifications@github.com wrote:

I had the same feeling as @cortner, but I failed to find good examples of where it's useful. I'm sure there are valid cases. (And this feature is much less dangerous than automatic rounding conversion from float to int, there is no loss of information here.)

— Reply to this email directly or view it on GitHub.

IainNZ commented 9 years ago

Only time I've used this is to select percentiles lazily, e.g.

n = 927
x = rand(n)
sort!(x)
@show x[n/4]
@show x[3n/4]

but the "cure" is div. I think the real problem will be people not knowing about div.

mbauman commented 9 years ago

Well if folks like my proposal to refactor getindex on AbstractArrays (#10525) then one of my reasons for liking this change will go away. As Tim noted, Real's placement in the type hierarchy makes extending getindex particularly tricky and annoying (https://github.com/JuliaLang/julia/pull/10458#issuecomment-77957672). If we only have one getindex method defined for AbstractArrays (without specializing on the indices), then that concern is no longer relevant.

I still think this is a sensible restriction to make, but I'd be open to reconsidering it now.

jiahao commented 9 years ago

As I had posted on julia-users, an edge case shows up when indexing with floats in regions where epsilon > 1:

julia> (1:10^17)[1e16:(1e16+5)] #Not the same as indexing with (10^16:10^16+5) 
5-element Array{Int64,1}: 
 10000000000000000 
 10000000000000000 
 10000000000000002 
 10000000000000004 
 10000000000000004
cortner commented 9 years ago

I think it should be easy enough to throw an exception when a conversion is not "safe".

cortner commented 9 years ago

Re use cases: it occurs e.g. when I compute an index using round, but more generally I've stumbled over it a few times. Always easy to fix, but annoying.

cortner commented 9 years ago

@dlfivefifty : I am not convinced about the type instability. When there is a performance issue then a programmer who actually cares will be able to find it. Julia provides very nice tools to find type instability.

dlfivefifty commented 9 years ago

Probably wasn't much performance impact, but it did bother me that something I called k was becoming a float...it's very unlikely I would have found this otherwise. (More unlikely that it would ever make a difference however..)

My feeling is 99% of cases where get index is called with float are unintentional.

(I do kinda think down the line there should be a "relaxed" mode where Julia is not so picky though..)

Sent from my iPad

On 16 Mar 2015, at 10:46 pm, Christoph Ortner notifications@github.com wrote:

@dlfivefifty : I am not convinced about the type instability. When there is a performance issue then a programmer who actually cares will be able to find it. Julia provides very nice tools to find type instability.

— Reply to this email directly or view it on GitHub.

jiahao commented 9 years ago

I think it should be easy enough to throw an exception when a conversion is not "safe".

@cortner If the index doesn't round to itself, sure. However, I don't think it is so easy to tell in the example I gave whether or not that command was intentional or a mistake. Maybe someone forgot to round the entries of the index set. But maybe someone genuinely intended to do precisely that (since you can index into an array with an arbitrary index set). Trying to second-guess a user's intent is a dangerous rabbit hole to go down.

jiahao commented 9 years ago

AFAIK only Matlab/Octave and Mathematica support indexing into an array with a float. Maple, C, Python, Fortran do not.

StefanKarpinski commented 9 years ago

For dense arrays, this is not such a big concern since indexing into a Float64 vector with a length of maxintfloat() requires 64 petabytes of data. For sparse data structures, of course, this can actually happen, although I suspect it's still rare to have integral dimensions that are that large.

johnmyleswhite commented 9 years ago

Once we have COO arrays, I think you should expect to see a lot of integral dimensions outside of the 2^52 range.

cortner commented 9 years ago

But I still argue that one can simply throw an exception when the conversion is unsafe.

jiahao commented 9 years ago

@cortner and my point is that you cannot tell in my example whether it is "unsafe".

cortner commented 9 years ago

@jiahao , I think your example shows that it is unsafe to use float-indexing for numbers > 1/eps.

jiahao commented 9 years ago

Also, this is not just an academic counterexample when it comes to Float32s. maxintfloat(Float32) is 2^24, and there are already quite a few data sets of this size. (See the Network Repository for some examples.)

cortner commented 9 years ago

For Float32, float-indexing is unsafe for indices > 1/eps(Float32)

jiahao commented 9 years ago

@cortner Julia allows indexing with an arbitrary index set. x[ [10^16, 10^16+1, 10^16+2] ] is a valid indexing operation. It is tempting but incorrect to think that x[ [1e16:(1e16+2.0)] ] might do the same thing. Instead, it does the same thing as x[ [10^16, 10^16, 10^16+2] ] in the default round-to-even rounding mode. But this is also a valid indexing operation. To claim that the second indexing operation is "unsafe" is to make a subjective judgment call on user intent which uses valid and unambiguous (if not necessarily intuitive) semantics.

Similarly, consider x[ [10^16, 10^16+1] ] != x[ [1e16:(1e16+1.0)] ] == x[ [1e16] ].

cortner commented 9 years ago

I understand all the arguments (to some extent anyways), all I keep saying is that you are taking the point of view of a computer scientist and not that of a numerical analyst / user. One of the nice things about Julia is that, if I don't want to, then I don't need to worry about types. But suddenly now I do.

In all honesty, it is a relatively minor issue, but many such minor issues taken together will make Julia a less attractive language for teaching, for casual programmers, and maybe even for programmers in general. I plan to collect a list of examples along the same lines and will post them at some point.

jiahao commented 9 years ago

If we lived in a world where all floating point numbers had epsilon <= 1 I would agree that floating point indexing is much less problematic conceptually. However, we don't live in that world, so we have the problem of balancing the "convenience" of writing A[1.0] against the "problem" of indexing with large floats. All I'm cautioning against is any concessions to convenience that come at the expense of overall semantic consistency.

SimonDanisch commented 9 years ago

@cortner you're slowly convincing me, as it now reminds me of [a => b] becoming Dict(a=>b), which also made me quite sad... Maybe move these kind of checks into Lint.jl, or something comparable? So someone who wants safety, can still get it, and someone who just wants to play around doesn't have to think much about types and conversions. This doesn't address @jiahao's concern of semantic inconsistency though....

nalimilan commented 9 years ago

@jiahao If floating point indexing is forbidden, then it's obviously not considered as "valid". Thus, allowing a subset of those operations gives more freedom to users, so I don't see the problem about it being a "judgement call".

FWIW, also add R to the list of software that allow indexing with floats.

quinnj commented 9 years ago

I think it all depends on how you look at it. There have been several people who have posted about how they've found bugs/type issues in their code due to this change and are grateful for this change. In a teaching situation, we have right there some great examples to help explain why this certain restriction was chosen; more so as an overall design decision than a "correctness" decision and here's why.

I also don't think it's too terrible to define your own float getindex methods if you have your own use case for them. There may be legitimate use cases and if it really makes sense for a package, they can define their own and use them or as a user, put it in your .juliarc file.

On Mon, Mar 16, 2015 at 9:40 AM, Christoph Ortner notifications@github.com wrote:

I understand all the arguments (to some extent anyways), all I keep saying is that you are taking the point of view of a computer scientist and not that of a numerical analyst / user. One of the nice things about Julia is that, if I don't want to, then I don't need to worry about types. But suddenly now I do.

In all honesty, it is a relatively minor issue, but many such minor issues taken together will make Julia a less attractive language for teaching, for casual programmers, and maybe even for programmers in general. I plan to collect a list of examples along the same lines and will post them at some point.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/10154#issuecomment-81747535.

jiahao commented 9 years ago

@nalimilan I think you may be conflating "forbidden" with "not defined". If we define this method, it should do something reasonable for all valid inputs of that given type. I'm not opposed to something like

if 0 <= floatidx < maxintfloat(typeof(floatidx))
    idx = round(Int, floatidx)
    return idx==floatidx ? A[idx] : throw(InexactError())
end

The unresolved question in my mind is what should happen for other values. One possibility is to assert that users should never do this and raise an error. Another possibility is to assume that users know what they are doing past this point and allow the indexing. If it is not clear what this method should do then the most conservative choice is to not define it in Base, since the base library does not have the luxury of ignoring corner cases that user code doesn't care about.

tkelman commented 9 years ago

Another good candidate definition for the MatlabCompat.jl package?

nalimilan commented 9 years ago

@jiahao I think the proposal is to raise an error when indexing above maxintfloat. People who know what they are doing can still convert manually to an integer beforehand (as currently).

@tkelman But also into RCompat.jl then. I don't really like the idea of encouraging alternative definitions for core functions like getindex(::AbstractArray, ...) in supplementary packages. This means [] won't have the same meaning depending on the background of the person who wrote the code, and contrary to a convenience function it's not easy to find out where it comes from.

cortner commented 9 years ago

@tkelman That is a possibility, but to be honest I would prefer to not have to rely on a package, which may or may not be updated on time with new versions of the language.

@nalimilan yes, very good point.

mbauman commented 9 years ago

@cortner Is your main objection that you are already calling round (or floor/trunc/ceil/etc) and don't like the verbosity and type-fiddliness of round(Int, …)?

cortner commented 9 years ago

@mbauman the short answer is yes; it seems small when said like this, but it is a matter of principle. I want to have the choice to not worry about types. And it is not so much myself I am worried about, as I will get used to it, but I think these sort of design decisions will play against Matlab and R users converting to Julia.

johnmyleswhite commented 9 years ago

I want to have the choice to not worry about types.

I don't really think that's a viable choice, even though it's occasionally been claimed that you don't have to worry about types. Right now, the type inference system really only lets you ignore types if you exclusively work with Int, Float64 and Array{Float64}. Even with that caveat, conflating Int and Float64 poses serious problems for both performance and correctness. (It's not an accident that both Matlab and R made themselves more "user-friendly" by discouraging users from ever using any integers.)

As soon as you work with strings (let alone a custom type), you discover that you need to know a lot about the type system before you can write efficient code in Julia. You need to know which types are abstract, so that you can make sure that you never include those types as fields in a type; you need to know which visually similar values (e.g. 0 vs 0.0 / "foo" vs "fóo") have distinct types, so that you can avoid writing code that has variables with performance-damaging type uncertainty.

As such, I think it would be unwise for the Julia community to claim that you have a choice to not worry about types. We could only achieve that if we adopted a radically more limited type system and removed integers completely from the language.

Since that won't likely be happening, I think it's a virtue of the debated indexing change that it encourages users to make sure that they never think of integers and floating-point numbers as interchangeable, since they manifestly are not interchangeable in Julia's current state.

dlfivefifty commented 9 years ago

I can imagine a new user doing something like

A=[1,2] A[1]=2.4

where you definitely do have to worry about types.

Sent from my iPhone

On 17 Mar 2015, at 5:51 am, John Myles White notifications@github.com wrote:

I want to have the choice to not worry about types.

I don't really think that's a viable choice, even though it's occasionally been claimed that you don't have to worry about types. Right now, the type inference system really only lets you ignore types if you exclusively work with Int, Float64 and Array{Float64}. Even with that caveat, conflating Int and Float64 poses serious problems for both performance and correctness. (It's not an accident that both Matlab and R made themselves more "user-friendly" by discouraging users from ever using any integers.)

As soon as you work with strings (let alone a custom type), you discover that you need to know a lot about the type system before you can write efficient code in Julia. You need to know which types are abstract, so that you can make sure that you never include those types as fields in a type; you need to know which visually similar values (e.g. 0 vs 0.0 / "foo" vs "fóo") have distinct types, so that you can avoid writing code that has variables with performance-damaging type uncertainty.

As such, I think it would be unwise for the Julia community to claim that you have a choice to not worry about types. We could only achieve that if we adopted a radically more limited type system and removed integers completely from the language.

Since that won't likely be happening, I think it's a virtue of the debated indexing change that it encourages users to make sure that they never think of integers and floating-point numbers as interchangeable, since they manifestly are not interchangeable in Julia's current state.

— Reply to this email directly or view it on GitHub.

StefanKarpinski commented 9 years ago

Yeah, I don't really buy the "you don't have to worry about types" thing – because that will last all of about a day – but I still think that we should in general do conversions automatically when they're safe and obvious. I'm not really convinced that the fact that you may have previously lost precision when computing a floating-point value is the concern of the indexing operation.

jakebolewski commented 9 years ago

@cortner, @StefanKarpinski why should indexing for arrays be special cased here? Do you think that most users define indexing for all subtypes of Reals for user defined types? Almost no one does. To index consistently across all user defined types you have to explicitly convert to an Int anyways. Reasoning about Ints in the system is inescapable.

In fact, indexing by floating point values is not even consistently defined in Base. For example, the dense case is defined:

julia> A = randn(3,3)
3x3 Array{Float64,2}:
 -1.44595   1.07011    2.06128
 -0.46673   0.406243   3.70655
 -0.564156  0.494642  -0.936865

julia> A[1.0,1.0]
-1.4459545197847976

but for sparse arrays, and much of the AbstractArray subtypes it is not

julia> SA = sprandn(3,3,1.0)
3x3 sparse matrix with 9 Float64 entries:
    [1, 1]  =  -0.294094
    [2, 1]  =  0.00830028
    [3, 1]  =  -1.11293
    [1, 2]  =  0.304284
    [2, 2]  =  -0.786085
    [3, 2]  =  -0.591092
    [1, 3]  =  0.648532
    [2, 3]  =  1.72333
    [3, 3]  =  -0.0314884

julia> SA[1.0,1.0]
ERROR: MethodError: `getindex` has no method matching getindex(::Base.SparseMatrix.SparseMatrixCSC{Float64,Int64}, ::Float64, ::Float64)
Closest candidates are:
  getindex(::AbstractArray{T,2}, ::Any, ::Any, ::Any, ::Any...)
  getindex(::AbstractArray{T,N}, ::Real)
  getindex(::Base.SparseMatrix.SparseMatrixCSC{Tv,Ti<:Integer}, ::Integer)
  ...

julia> Diagonal(A)[1.0,1.0]
ERROR: MethodError: `getindex` has no method matching getindex(::Base.LinAlg.Diagonal{Float64}, ::Float64, ::Float64)
Closest candidates are:
  getindex(::AbstractArray{T,2}, ::Any, ::Any, ::Any, ::Any...)
  getindex(::AbstractArray{T,N}, ::Real)
  getindex(::Base.LinAlg.Diagonal{T}, ::Integer, ::Integer)
  ...

@mbauman work would help here for subtypes of AbstractArray, but that still does not address the behavior for user defined types. We could obviously export the conversion and promote it as the way to do indexing but that would be a big change to the Julia code which is currently out in the wild.

StefanKarpinski commented 9 years ago

To be clear, I'm kind of ambivalent about this. It's annoying how inconsistent this is across all of Base and various packages. I don't think that indexing is an exceptional case – I want a more consistent policy. But to have such a policy, I think we need to have a reason why we draw the line where we do. Currently, it's a very wiggly line.

johnmyleswhite commented 9 years ago

For me, the bright line is just that indexing shouldn't use floating-point numbers in addition to allowing integers. The rationale I see is that:

For me, floating point indexing is almost entirely lose-lose. It's kind of cute that it works, but it's not cute enough to justify its existence in the absence of some other compelling rule that would imply it ought to exist.

jakebolewski commented 9 years ago

I just want some consistency in the interface as well, but I agree with @johnmyleswhite's points which is why I proposed the change.