JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.36k stars 5.46k forks source link

Does `collect` always return an Array or can it return `AbstractArray`s? #50051

Open FelixBenning opened 1 year ago

FelixBenning commented 1 year ago

The documentation says Array

help?> collect

...
  collect(collection)

  Return an Array of all items in a collection or iterator. For dictionaries, returns Pair{KeyType, ValType}. If the argument is array-like or is an iterator with the HasShape trait, the result will have the same shape and number of dimensions as the argument.

but collect of Iterators.product breaks this promise:

julia> Iterators.product(OffsetArray(rand(3), -1:1), 1:2) |> collect
3×2 OffsetArray(::Matrix{Tuple{Float64, Int64}}, -1:1, 1:2) with eltype Tuple{Float64, Int64} with indices -1:1×1:2:
 (0.892917, 1)  (0.892917, 2)
 (0.398366, 1)  (0.398366, 2)
 (0.550785, 1)  (0.550785, 2)

I think returning an AbstractArray is fine if the documentation is updated. Interestingly collect(collect()) results in a normal Array. So OffsetArrays is following the documentation.

jishnub commented 1 year ago

Collect for AbstractArrays follows a different path from that for iterators. The latter seems to use similar to generate the destination, which it probably shouldn't

jakobnissen commented 1 year ago

I think we need to be a little careful generalizing this more. See Issue 47777. Briefly:

FelixBenning commented 1 year ago

Most people probably collect for the getindex feature. This means that an AbstractArray would be fine. Maybe the solution would be to have:

collect! would be a no-op for any AbstractArray and collect would always be copy.

jishnub commented 1 year ago

Reading the documentation again,

If the argument is array-like or is an iterator with the HasShape trait, the result will have the same shape and number of dimensions as the argument.

Perhaps "shape" should be replaced either by size or axes, as an iterator that HasShape also defines axes for itself, and size may be interpreted as length.(axes) which is defined by default in this case. In the original iterator example, the axes are preserved, which might have been the intended meaning of shape in this case. Collecting an AbstractArray, on the other hand, preserves size and discards axes.

andrewjradcliffe commented 1 year ago

If I may propose behavior which should not be the case, illustrated on Base types, but which becomes permissible if AbstractArray returns are permissible:

julia> v = collect(Float64, 1:2:5)
1.0:2.0:5.0

julia> typeof(v) <: AbstractArray
true

One must be careful that collect actually produce concrete values, rather than another (implied) iterable entity which will be used to form a generator; or, even more surprising, an abstract representation from a concrete. In other words, let's not facilitate more abuse of AbstractArray on operations which should not be surprising. A non-so-julian solution is to introduce something such as collect_into(container_type, iter); this already exists in Julia as map due to its eager evaluation semantics.

JeffBezanson commented 1 year ago

Triage thinks collect means "return a new container where all the values are stored explicitly and contiguously" --- perhaps a DenseArray? I think it is currently always mutable as well. So I do think it can return something other than Array as appropriate. And ideally, if you really want an Array you should be able to get that via Array(itr) (currently, not always implemented unfortunately).

Docs should be updated.