JuliaArrays / OffsetArrays.jl

Fortran-like arrays with arbitrary, zero or negative starting indices.
Other
195 stars 40 forks source link

Type-piracy in similar with `OffsetAxis` #87

Open dlfivefifty opened 4 years ago

dlfivefifty commented 4 years ago

This is type-piracy:

https://github.com/JuliaArrays/OffsetArrays.jl/blob/d50463f9149dbe238a64be7cf8a58d904f662166/src/OffsetArrays.jl#L99

timholy commented 4 years ago

It's basically "sanctioned type piracy." The issue is that Base is incapable of creating arrays with axes that start at anything other than 1. The dispatch hierarchy for Base ensures that any combination of inputs that can be reduced to axes that start at 1 will get caught by methods in Base. This method is the fallback, in case that doesn't happen. That's OK here because this is the foundational package for arrays with offset axes.

Any other package that wants to define offset axes has to define their own axis types. See https://docs.julialang.org/en/latest/devdocs/offset-arrays/#Custom-AbstractUnitRange-types-1.

Does that cover it? Or is there an issue that I'm not understanding?

dlfivefifty commented 4 years ago

The issue is that as designed now type ambiguities in other packages (in this case BlockArrays.jl) only are triggered if a user loads OffsetArrays.jl.

I think a better approach would be in Base to add something like:

_offset_similar(_) = error("Not implemented in Base. Please use external package.")
Base.similar(A::AbstractArray, ::Type{T}, inds::Tuple{OffsetAxis,Vararg{OffsetAxis}}) where T = _offset_similar(A, T, index)

then OffsetArrays.jl would overload Base._offset_similar. This means ambiguity is consistent.

timholy commented 4 years ago

If you can sync this across Base, BlockArrays, and OffsetArrays, by all means go for it! Might want to also check CatIndices.jl as an example of a package that defines its own index types.

dlfivefifty commented 4 years ago

I won’t be able to tackle this right now but perhaps in the future. Maybe just leave this open for now?

yha commented 4 years ago

This type piracy has become worse with #90: now similar is pirated for general AbstractArray with the new index type IdOffsetRange, which means other packages defining similar(a::OtherArrayType,...) must disambiguate on an index type internal to OffsetArrays. See https://github.com/Vexatos/CircularArrays.jl/pull/2#issuecomment-581382145. I think what is needed is something like https://github.com/JuliaArrays/OffsetArrays.jl/issues/87#issuecomment-560376691 suggested above but with OffsetAxis being an abstract type that packages can subtype. Then the rule would be that package may specialize offset_similar only for the last (indexes) argument, and similar only for the first (array) argument, like this:

# In Base
const OffsetDims = Tuple{OffsetAxis,Vararg{OffsetAxis}}
similar(A::AbstractArray, ::Type{T}, off::OffsetDims) where T = offset_similar(A, T, off)

# In OffsetArray
struct IdOffsetRange <: OffsetAxis
  ...
end
function Base.offset_similar(A::AbstractArray, ::Type{T}, inds::IdOffsetRange) where T
  ...
  OffsetArray(...)
end

# In another package
Base.similar(A::MyArrayType, ::Type{T}, dims::Dims) = MyArrayType(...)
Base.similar(A::MyArrayType, ::Type{T}, inds::OffsetDims) = MyArrayType(offset_similar(A,T,inds))
# and/or
Base.offset_similar(A::AbstractArray, ::Type{T}, inds::MyOffsetDimsType) = MyArrayType(...)
timholy commented 4 years ago

Base ranges need an overhaul anyway to address https://github.com/JuliaLang/julia/pull/30950. That PR was starting to turn into a nightmare, I think the best option will be to start fresh and go slowly, thinking very carefully about how to avoid breakage. (There is a chance that those issues simply can't be fixed until Julia 2.0 :frowning: ). It would be good to address this as part of that effort.

I don't see myself as having time for it in the next few weeks or more, so definitely looking for help from others!

Tokazama commented 4 years ago

I'm working on something similar to this right now (OffsetAxes.jl) that builds on AxisIndices.jl. If I understand correctly (which it's possible I don't) then similar was originally designed to handle element types and size. This inserts the notion of "axes" which isn't really a part of Array and may not be applicable to many other subtypes of AbstractArray. I'm running into some similar things with reshape too.

Would it make sense to have a more explicit argument for what is happening here? Perhaps something like reaxes because the point is to change the axes of an array.

yha commented 3 years ago

I think the abmiguity issues caused by this piracy is nearly resolved now. At least, it's resolved for CircularArrays, where this issue previously caused trouble (see Vexatos/CircularArrays.jl#8) The reason is that the signature has become https://github.com/JuliaArrays/OffsetArrays.jl/blob/f20ecfc8485904f5a825fcde805f878b981e60ca/src/OffsetArrays.jl#L219 and #157 changed OffsetAxisKnownLength to mean Union{Integer, AbstractUnitRange} (which is the same thing as Base.DimOrInd). So packages can define similar for dims::Tuple{DimOrInd, Vararg{DimOrInd}} without triggering ambiguity with this method and without depending on a type internal to OffsetArrays. This signature is almost the current "official" recommendation in https://github.com/JuliaLang/julia/blob/110f1256ac88ee0f12885289e807229acf9c44a3/base/abstractarray.jl#L723-L727 (it says UnitRange rather than AbstractUnitRange). Perhaps the official documented recommendation should now be to use Base.DimOrInd when specializing similar for offset arrays.

akosuas commented 3 years ago

I just encountered something similar when trying to reshape an Array where the length was given as Int32 instead of Int64:

julia> using OffsetArrays
julia> which(Base.reshape, (Array{Float64, 1}, Int32))
reshape(A::AbstractArray, inds::Union{Colon, Integer, AbstractUnitRange}...) in OffsetArrays at ~/.julia/packages/OffsetArrays/PXUn7/src/OffsetArrays.jl:225
(MWE) pkg> st
Project MWE v0.1.0
Status `MWE/Project.toml`
  [6fe1bfb0] OffsetArrays v1.4.1

This feels a bit naughty to me.

mcabbott commented 3 years ago

To be fair, that's a MethodError without this package, and the expected answer with it. More surprising is:

julia> reshape(rand(1,2), Int32(2))
2-element Vector{Float64}:
 0.23063257026910478
 0.007514586985956306

julia> reshape(rand(2,3), :, Int32(2))
ERROR: StackOverflowError:
Stacktrace:
 [1] reshape(A::Matrix{Float64}, inds::Tuple{Colon, Int32})
   @ OffsetArrays ~/.julia/packages/OffsetArrays/ExQCD/src/OffsetArrays.jl:227
akosuas commented 3 years ago

Ah, my bad - hadn't thought to try it, assumed the Base one would accept any Integer.

Yes - that StackOverflowError is how I found this! My original case that crashed was something like reshape(::Array{Float32,3}, Int32, Int64, Colon).

mcabbott commented 3 years ago

I assumed the same. And while I didn't write this, if I had, it would have been accidental piracy from this assumption -- trying to allow UnitRanges in otherwise the same Union as Base has. Maybe the signature here should just be tightened to match.

(Also, how did you land up with an Int32? I'm not sure I've ever seen one in the wild...)

jishnub commented 3 years ago

Many of the reshape methods are defined in reshapedarray.jl, where curiously Int32 arguments are permitted if all the sizes are specified, but disallowed if one of them is specified implicitly using a colon. The method defined in this package appears to try to call one of these, fails, and ends up calling itself.

julia> reshape(rand(2,3), (Int32(3), Int32(2)))
3×2 Array{Float64,2}:
 0.296956   0.113515
 0.0535506  0.564747
 0.90234    0.00424624

julia> reshape(rand(2,3), (Int32(3), :))
ERROR: MethodError: no method matching reshape(::Array{Float64,2}, ::Tuple{Int32,Colon})

Even stranger is the fact that this is only permitted when the sizes are passed as a Tuple, but not allowed when passed as a Vararg.

julia> reshape(rand(2,3), Int32(3), Int32(2))
ERROR: MethodError: no method matching reshape(::Array{Float64,2}, ::Int32, ::Int32)

I wonder if there are a few method signatures that need to be altered here?

phipsgabler commented 2 years ago

I found this funny behaviour, reproducible in a fresh temp project with only OffsetArrays:

julia> using OffsetArrays

julia> A = rand(10, 10);

julia> A[Base.Slice(1:10)]
10-element OffsetArray(::Vector{Float64}, 1:10) with eltype Float64 with indices 1:10:
 0.31581111033816844
 0.2408986878236269
 0.865276045142753
 0.4158474542637325
 0.5674733494230301
 0.5910918167329304
 0.5583008496658141
 0.8863892215050752
 0.9094492113020204
 0.1511050866350474

Without OffsetArrays, it's a method error:

julia> A[Base.Slice(1:10)]
ERROR: MethodError: no method matching similar(::Matrix{Float64}, ::Type{Float64}, ::Tuple{Base.IdentityUnitRange{UnitRange{Int64}}})
jishnub commented 2 years ago

I think such type-piracy, where an error becomes a valid operation, isn't considered too serious. After all, Base cannot return the correct result here, since in general, this will have offset axes. Currently Base chooses to be conservative and simply doesn't permit the operation, while OffsetArrays produces the correct result for all such slices:

julia> A[Base.Slice(9:10)]
2-element OffsetArray(::Vector{Float64}, 9:10) with eltype Float64 with indices 9:10:
 0.13029549937036167
 0.024726720571425886

julia> UnitRange(axes(A[Base.Slice(1:10)], 1))
1:10

julia> UnitRange(axes(A[Base.Slice(9:10)], 1))
9:10

In fact, interestingly, such an operation for ranges does not preserve the axes even with OffsetArrays loaded

julia> r = 1:10; s = Base.Slice(9:10);

julia> r[s] |> axes
(Base.OneTo(2),)

julia> Vector(r)[s] |> axes
(OffsetArrays.IdOffsetRange(values=9:10, indices=9:10),)

julia> r == Vector(r)
true

julia> r[s] == Vector(r)[s] # should hold, but doesn't as the axes don't match
false

The reason for this is perhaps historical. I had proposed a fix for this, but I haven't looked at it in a while.

rafaqz commented 2 years ago

@jishnub I have to disagree that an error not throwing is not serious type piracy.

When OffsetArrays.jl is just a secondary dependency not loaded by the user, it can still break the locality of errors when debugging - there are many typos with base julia methods and types that fail in Base but don't with OffsetArrays.jl loaded.

That is not something you look for, because it shouldn't be possible. Especially when you havent even intentionally loaded OffsetArrays.jl.

If every package did this Julia would be unusable. I'm not sure why OffsetArrays still gets a pass.

timholy commented 2 years ago

there are many typos with base julia methods and types that fail in Base but don't with OffsetArrays.jl loaded

@rafaqz can you give some concrete examples? I'm in agreement with @jishnub, but happy to be persuaded otherwise. I don't understand what kinds of typos you're talking about, so I'd need some hints from you to understand how what you're describing could happen.

rafaqz commented 2 years ago

The most obvious one from my other issue: I typed something like zeros(10:20) instead of zero(10:20).

Hurrah, OffsetArray.

Then get an error somewhere totally different, with a bounds error.

I didnt even load OffsetArrays. It was only installed as a dependency.

With the examples above, its easy to pass range instead of axes(range, 1) to base methods like Slice, and not get the error you should get. Or to accidentally use Int32 instead of Int64 to index.

These mistakes should be quickly uncovered in predictable ways.

timholy commented 2 years ago

from my other issue

For posterity: this references #306

OK, but FYI I happened to see this and not that. The more explicit you can be the better: "this bugs me" is not really actionable. Anyway, sorry it annoyed you and I see your point.

rafaqz commented 2 years ago

Absolutely, apologies for not being clearer.

Part of the vagueness is how I experience the problems with type piracy on errors is quite abstract. "zeros should error" sounds like a nitpick compared to my perception of the scope of the problem. I will attempt to clarify.

Essentially I don't consider type piracy on Base methods/types in the list of possible problems when I'm debugging code, because its a large mental overhead to do that.

Generally we have consistent set of methods and types in scope we can only intentionally add to by importing packages. And we know:

When we hit errors, we can apply our understanding of these scopes to limit the potential causes of problems, and solve them more quickly. We can look at the stack trace, and know the methods and types that could have led to it, and scan for them in our code.

Type piracy on base methods and types breaks these assumptions. The possible scope of the call stack for any method call becomes all of the packages currently loaded and all of their dependency trees (realistically its the packages that are allowed to do type piracy, but I'm not across the history of that).

We also can't go in reverse from an error in an OffsetArrays.jl method to looking for a method or type from a package that may depend on OffsetArrays.jl. Instead we have to inspect every line of our code as a potential source of the error.

To me, getting errors in predictable ways is valuable. OffsetArrays doing type piracy isn't just about occasionally getting these strange errors, but about breaking an efficient approach to programming that ignores the possibility of these problems completely.

matthias314 commented 2 weeks ago

I apologize for not having worked through this long thread. I only want to point out that there is also a conflict with JuliaArrays/FFTViews.jl: If there you add using OffsetArrays at the beginning of test/runtests.jl, then you get

     Testing Running tests...
basics: Error During Test at /usr/local/git/FFTViews.jl/test/runtests.jl:29
  Got exception outside of a @test
  MethodError: similar(::Type{Array{Int64}}, ::Tuple{Base.IdentityUnitRange{FFTViews.URange{Int64}}, Base.IdentityUnitRange{FFTViews.URange{Int64}}}) is ambiguous.

  Candidates:
    similar(::Type{T}, shape::Tuple{Union{Integer, AbstractUnitRange}, Vararg{Union{Integer, AbstractUnitRange}}}) where T<:AbstractArray
      @ OffsetArrays /usr/local/julia/packages/OffsetArrays/hwmnB/src/OffsetArrays.jl:331
    similar(f::Union{Function, Type}, shape::Tuple{Union{Base.IdentityUnitRange{FFTViews.URange{T}}, Base.Slice{FFTViews.URange{T}}, FFTViews.URange{T}} where T, Vararg{Union{Base.IdentityUnitRange{FFTViews.URange{T}}, Base.Slice{FFTViews.URange{T}}, FFTViews.URange{T}} where T}})
      @ FFTViews /usr/local/git/FFTViews.jl/src/FFTViews.jl:72

  Possible fix, define
    similar(::Type{T}, ::Tuple{Union{Base.IdentityUnitRange{FFTViews.URange{T}}, Base.Slice{FFTViews.URange{T}}, FFTViews.URange{T}} where T, Vararg{Union{Base.IdentityUnitRange{FFTViews.URange{T}}, Base.Slice{FFTViews.URange{T}}, FFTViews.URange{T}} where T}}) where T<:AbstractArray

  Stacktrace:
   [1] macro expansion
     @ /usr/local/git/FFTViews.jl/test/runtests.jl:47 [inlined]
   [2] macro expansion
     @ /usr/local/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [3] top-level scope
     @ /usr/local/git/FFTViews.jl/test/runtests.jl:30
   [4] include(fname::String)
     @ Base.MainInclude ./client.jl:489
   [5] top-level scope
     @ none:6
   [6] eval
     @ ./boot.jl:385 [inlined]
   [7] exec_options(opts::Base.JLOptions)
     @ Base ./client.jl:291
   [8] _start()
     @ Base ./client.jl:552
Test Summary: | Pass  Error  Total  Time
basics        |   12      1     13  1.6s
ERROR: LoadError: Some tests did not pass: 12 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /usr/local/git/FFTViews.jl/test/runtests.jl:29
ERROR: Package FFTViews errored during testing
rafaqz commented 2 weeks ago

This is a common problem with OffsetArrays and tests. I've hit it in both DimensionalData.jl and GeometryOps.jl. The worse case is the opposite one, where tests pass when OffsetArrays is loaded but break when its not.

jishnub commented 2 weeks ago

Cross-referencing https://github.com/JuliaArrays/OffsetArrays.jl/issues/306#issuecomment-2296261278, which might be a way to avoid this for packages.