JuliaArrays / OffsetArrays.jl

Fortran-like arrays with arbitrary, zero or negative starting indices.
Other
195 stars 40 forks source link

Completely stop pirating Base methods #306

Open rafaqz opened 2 years ago

rafaqz commented 2 years ago

In a session where OffsetArrays is a dependency of some package that I'm using, this is very strange to see:

julia> zeros(10:20)
11-element OffsetArray(::Vector{Float64}, 10:20) with eltype Float64 with indices 10:20:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

This really should be an error. An OffsetArray is not at all what I wanted. I simply made a typo.

The main problem with this is it breaks simple mental models for understanding what has gone wrong in arbitrary julia code. Base type + Base method returning a package type should be completely outside of what we expect as a possible behavior.

jishnub commented 2 years ago

I'm in favour of removing these convenience methods in OffsetArrays 2.0. With the new Origin type, these aren't inconvenient to create anymore.

rafaqz commented 2 years ago

Great. A wrapper type is how to do this. DimensionalData.jl does similar things without any piracy.

jishnub commented 2 years ago

Perhaps we need to retain the convenience of zeros(indices) in some way. Currently, the ways to construct such an array is

In the first case, providing the length is redundant given the axes. The second case is useful if we know the origin, but not as much if we know, say, the endpoint and the length. The missing feature is the ability to construct such an array given just the axes, which is precisely the convenience that zeros(indices) used to provide. Perhaps the solution is simply to introduce another wrapper type named Axes and ask users to call zeros(Axes(indices)) instead of zeros(indices). The upside of this is that this is very explicit, and unlikely to come as a surprise to a reader. The downside is the proliferation of types that exist merely as convenience features.

rafaqz commented 2 years ago

Yes thats exactly how DimensionalData.jl does it. X/Y/Z wrappers can be used in base methods like zero, fill, rand to get arrays with named axes and lookup keys. rand(X('a':'z'), Y(5)) gives a 26×5 array with X and Y dims, X having the alphabet as a lookup index.

OffsetArrays could be similar.

Axes seems a little generic to be exported without clashes (maybe Makie.jl already exports it?), what about Offset? Like zeros(Offset(10:20), 5)

That screams "This returns an OffsetArray".

Edit: no Makie.jl exports Axis. Maybe its fine, IDK.

jishnub commented 2 years ago

offset has a different meaning in this package (the difference between firstindex(axis) and 1). Perhaps a name like OffsetAxes might be clearer and unlikely to clash with other symbols. Then again, we may end up not exporting this, like we haven't exported Origin.

rafaqz commented 2 years ago

Sure, if it's not exported you can have whatever. I was just wondering how to match the convenience of a range, and exporting something might help with that.

wnoise commented 2 years ago

Perhaps we need to retain the convenience of zeros(indices) in some way. Currently, the ways to construct such an array is ... In the first case, providing the length is redundant given the axes. ... The missing feature is the ability to construct such an array given just the axes, which is precisely the convenience that zeros(indices) used to provide. Perhaps the solution is simply to introduce another wrapper type named Axes and ask users to call zeros(Axes(indices)) instead of zeros(indices). The upside of this is that this is very explicit, and unlikely to come as a surprise to a reader. The downside is the proliferation of types that exist merely as convenience features.

Isn't the existing IdOffsetRange exactly the same thing as this new Axes type? A tuple of them is what gets returned for axes(::OffsetArray).

jishnub commented 2 years ago

Yes, it seems essentially equivalent, except it doesn't subtype AbstractUnitRange. Thus, we won't see different behavior for 3:5 and OffsetArrays.IdOffsetRange(3:5). Tbh I'm uncertain what's the best approach here. If we agree that the current behavior is convenient (I think it is), it might be possible to implement this in Base to avoid the type-piracy here (to some extent). This also obviates the need for additional types in this package. On the other hand, if the current behavior seems surprising (eg. why are zero(3:4) and zeros(3:4) different, and why is rand(3:4) something entirely different), then we may need to rethink how these methods should work.

The bigger type-piracy in this package is perhaps in similar. This might be more difficult to resolve.

wnoise commented 2 years ago

well, zero(3:4) should error instead of returning an array; an array is not the neutral element of a UnitRange. (There is no neutral element of UnitRange . StepRangeLen(0,0,len) is about as close as you can get, still a different type, but keeping it as a Range though.)

aplavin commented 2 years ago

(3:4) + zero(3:4) == (3:4), so it can definitely be considered a neutral element.

If you want to require typeof(zero(x) + x) == typeof(x) for some reason, then zero methods for other types would also be removed. Among others, this identity doesn't hold for x = true, x = π and x = LinearAlgebra.I.

But it seems more useful for zero(3:4) to continue returning an abstractarray: either a vector as it does now, or a StepRangeLen for efficiency.

wnoise commented 2 years ago

If you want to require typeof(zero(x) + x) == typeof(x)

Indeed, I almost always do, because that's the natural reading of the docs. My fundamental point is that a range is not actually an array, not the collection of numbers it can be expanded into, but a promise about constant spacing (and density for UnitRanges). Converting to an array loses that information. There is currently a dispatch that fails (rather than merely losing efficiency) after a UnitRange is converted to an array, because it loses this information:

julia> r = 4:8
4:8

julia> mod(2, r)
7

julia> mod(2, r + zero(r))
ERROR: MethodError: no method matching mod(::Int64, ::Vector{Int64})
Closest candidates are:
  mod(::Union{Int128, Int16, Int32, Int64, Int8}, ::Unsigned) at int.jl:282
  mod(::Integer, ::Base.OneTo) at range.jl:1478
  mod(::Integer, ::AbstractUnitRange{<:Integer}) at range.jl:1479
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[27]:1

There are of course many others that merely lose efficiency -- e.g. minimum, etc.

x = true

I have always been suspicious of arithmetic on Bools. If you have to define it, it should be saturating and return Bools.

x = π

Heck, both x == x + zero(x) and the type one fails for this.

x = LinearAlgebra.I

This actually works! (except for the typeparameter changing from Bool to Int64 when unspecified, but see previous comment about arithmetic on Bools.)

julia> typeof(I)
UniformScaling{Bool}

julia> typeof(zero(I))
UniformScaling{Bool}

julia> typeof(I + zero(I))
UniformScaling{Int64}

julia> I + zero(I) == I
true
rafaqz commented 2 years ago

This may be an issue to raise in Base?

Here we have no ability to change the definition of zero, just zeros.

johnnychen94 commented 2 years ago

@jishnub @timholy do you think this package is stable enough to become a part of Base?

IMO similar is actually the consequence of OffsetArrays being conservative on pirating. If OffsetArray is a part of Base, it's then feasible to fix many nonconventional indexing semantics issues.

jishnub commented 2 years ago

Certain parts from this package may be ported to Base, such as the similar and reshape methods, and IdOffsetRange. This should provide most functionalities. I'm not certain if we should add OffsetArray to Base, as ideally offset indexing is something that should be handled through traits.

rafaqz commented 1 year ago

Another example of this: https://github.com/rafaqz/DimensionalData.jl/issues/464#issuecomment-1437217274

Loading OffsetArrays.jl "fixes" a bug in Base.stack with DimensionalData.jl.

And critically, we test against OffsetArrays.jl, as AbstractArray extending packages should do.

But this will hide the bug

Base functions can pass our test suit and fail in general use because OffsetArrays.jl is loaded in our tests.

jishnub commented 1 year ago

That's a bit unfortunate, although I think this is better fixed in Base, by defining an AbstractOneTo type that custom 1-based axes can subtype. This is the fundamental similar piracy which is at the core of this package, and might be harder to change than zeros and other convenience functions.

rafaqz commented 1 year ago

I'm not sure AbstractOneTo would solve this problem. The similar method in question can't be extended practically without type piracy, that only OffsetArrays.jl can do.

See: https://github.com/JuliaLang/julia/issues/48736#issuecomment-1437414217

jishnub commented 1 year ago

AbstractOneTo will not solve type piracy, but it'll ensure that similar for OneTo and equivalent types are always handled in Base, and only offset axes leak to this package.

mauro3 commented 3 months ago

This is another odd one:

julia> (1:10)[Base.IdentityUnitRange(5:7)]
5:7

julia> using OffsetArrays

julia> (1:10)[Base.IdentityUnitRange(5:7)]
OffsetArrays.IdOffsetRange(values=5:7, indices=5:7)
jishnub commented 3 months ago

This is technically more consistent, but we should probably stop defining these methods for IdentityUnitRange.

rafaqz commented 3 months ago

But it's also worse than my example because it's changing the return type of a Base method that did not error.

To me this one is really bad.

"technically more consistent" can't apply to type piracy, it's by definition technically inconsistent.

jishnub commented 2 weeks ago

How about a compromise for now: we allow projects to disable the type-piracy in OffsetArrays using Preferences.jl? This needs to be a one-time setup. This might confuse users of the package, but this might be clarified in the documentation of OffsetArrays.