JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.43k stars 5.46k forks source link

enumerate() like equivalents for iterating over columns/rows of a matrix #14491

Closed AzamatB closed 5 years ago

AzamatB commented 8 years ago

It would be good to be able to iterate over columns (rows) as follows:

for col in columns(A)
    # some code
end

This can be part of a more generic interface slices(A, dims), with rows(A) and columns(A) being alias for slices(A, 1) and slices(A, 2) respectively.

With this we can consider deprecating mapslices(f, A; dims) in favor of map(f, slices(A, dims))

andreasnoack commented 8 years ago

You'll save one line of code compared to

for i in 1:size(M,2)
    col = slice(M,:,i)
    # some code
end

Would there be any other advantages?

lnsp commented 8 years ago

It makes the code clearer and easier to understand. I would prefer it over slice, though I would rather call it columns.

for col in columns(matrix)
    # some code
end

and

for row in rows(matrix)
    # some code
end
nalimilan commented 8 years ago

If anything, I guess there would be a single method taking the index of the dimension, e.g.

for row in eachslice(array, 1)
    # some code
end
StefanKarpinski commented 8 years ago

If slice(M,:,i) were expensive, then this iterator could be implemented more efficiently. Not saying we should do this, but it's a possible argument for this kind of thing.

AzamatB commented 8 years ago

+1 for @mooxmirror's suggestion. And I agree that the main benefit of implementing these functions is the code that is clearer and easier to understand.

mauro3 commented 8 years ago

This is somewhat similar to sparse matrix iteration: http://docs.julialang.org/en/release-0.4/stdlib/arrays/#Base.nzrange. If something like this were to be added, could it be designed to work efficiently for sparse and other fancy matrices too?

AzamatB commented 8 years ago

This also potentially allows compiler to optimize loops to be stride efficient. For example when it sees the following code:

for row in rows(matrix)
    # some code
end

it could transform it to the following

for col in columns(matrix')
    # some code
end
cortner commented 8 years ago

Just to add that I'd also enjoy having something along these lines. I particular like the

for row in rows(matrix)
     #
end

notation

axsk commented 8 years ago

+1 The twoline version suggested by @andreasnoack of course does the job, but in a rather cryptic way given iterating over rows is such a natural operation.

But I am not sure if we should adopt the sub or the slice behaviour, keeping or dropping singleton dimensions respectively. I guess when I iterate over rows(matrix) I would actually expect row vectors, i.e. keeping the singleton dimensions.

I also agree @nalimilan that we should have a general iterator for higher dimensional arrays. Alternatively to eachslice I would call it slices, and furthermore also have subs for keeping the singleton dimension, as natural extensions to slice and sub.

OT: Like @nalimilan I would naturally expect eachslice(m, 1) to return rows, much like slicedim(m, 1, rownumber) also selects a row. But mapslices(f, m, [1]) applies f to columns. On the other hand mapslices(sum, m, [1]) is consistent with sum(m,1), the column-summation, but actually 1 is the dimension index of the rows and 2 of the columns... Is it just me or is anyone else also confused by these semantics?

AzamatB commented 8 years ago

@axsk +1. It would be nice to have consistency in dimension convention across the methods you have mentioned.

Mice7R commented 8 years ago

Well rows and columns is clearer of what is doing while slice would be a bit more obscure. I agree with @nalimilan with having a generic iterator but I think that rows and columns must be present. @axsk Yeah It confuses me too.

nalimilan commented 8 years ago

Well rows and columns is clearer of what is doing while slice would be a bit more obscure. I agree with @nalimilan with having a generic iterator but I think that rows and columns must be present.

As long as we don't provide nrow(x) = size(x, 1) and ncol(x) = size(x, 2), we shouldn't offer eachrow(x) = eachslice(x, 1) and eachcol(x) = eachslice(x, 2) either.

tbreloff commented 8 years ago

I'll put in my vote for the following:

tmp

If there's interest in this, I can put together a PR for this and a RowIterator, which is similar. Should I define any other methods? What file might this live in?

cortner commented 8 years ago

I think it would be great. I've implemented something similar for myself and find it very useful. It would be good to have it in Base.

A possibility might be to allow (kw argument?) to specify whether a view or a copy is returned for each column.

tbreloff commented 8 years ago

@cortner That would certainly be possible... might be best to include that flag as a parameter of the ColumnIterator type. Maybe:

for c in columns(X, copy=true)
    # something
end
andreasnoack commented 8 years ago

I think it would be type unstable to use a keyword argument so it might be better to have two different functions for the view and the copy version.

@tbreloff Why not offer a general slice iterator as proposed by @nalimilan? We could still have the handy names for 2D.

cortner commented 8 years ago

Agree about slice iterator + simplified terminology for rows and columns.

Re type instability: is there nothing on the horizon that will fix type instabilities for kwargs?

tbreloff commented 8 years ago

@andreasnoack Yes a generic slice iterator would be better in terms of LOC, but might require much more thought to get right. I'm willing to attempt it and compare.

type unstable to use a keyword argument

separate methods is fine by me... happy to follow what the group wants.

KristofferC commented 8 years ago

The type instability has (semantically) nothing to do with kwargs but that you want to return a different type depending on the value of an argument.

tbreloff commented 8 years ago

Was thinking about:

immutable ColumnIterator{T, A<:AbstractVecOrMat}
    a::A
end

...
Base.next(itr::ColumnIterator{:slice}, i::Int) = slice(itr.a, :, i), i+1
Base.next(itr::ColumnIterator{:copy}, i::Int) = itr.a[:,i], i+1
...

columns(a::AbstractVecOrMat) = ColumnIterator{:slice}(a)
columns(a::AbstractVecOrMat, should_copy) = ColumnIterator{:copy}(a)

# does a slice
for c in columns(X)
    # something
end

# does a copy
for c in columns(X, true)
    # something
end

And we could get tricky and create one array and just overwrite it at every iteration for the copy version, to reduce allocations. This could be generalized to a SliceIterator I'm sure.

KristofferC commented 8 years ago

Adding an unused argument seems like an abuse of dispatch. I would prefer different names over that ou r alternatively have the second argument be a type like Array, ArrayView etc indicating the type you want returned from the iterator.

gasagna commented 8 years ago

I had a thought about this a while ago, see https://github.com/gasagna/ArraySlices.jl . It might be useful to anyone willing to take a go at this.

jebej commented 7 years ago

This would also be useful for comprehensions over rows/columns.

wookay commented 7 years ago

a simple way would be

rows(M::Matrix) = map(x->reshape(getindex(M, x, :), :, size(M)[2]), 1:size(M)[1])
columns(M::Matrix) = map(x->getindex(M, :, x), 1:size(M)[2])
sbromberger commented 7 years ago

@wookay except now that you can't take the transform of a non-numeric array, rows breaks.

It's julia -e 'println("$(Dates.year(now()))")'. We really need row and column iterators over matrices.

tkelman commented 7 years ago

Open a PR? Or make a package with it?

sbromberger commented 7 years ago

@tkelman I've included the functionality in my own package. As for a PR (presumably to Base), there is ample code here and it's still been under debate for nearly 2 years. Not exactly encouraging.

wookay commented 7 years ago

@sbromberger yeah, non-iterating rows has updated it by reshape. thanks.

I think that we would add these rows, columns, ColumnIterator functionalities to https://github.com/JuliaCollections/Iterators.jl package.

timholy commented 7 years ago

Nice solution advertised in https://github.com/JuliaLang/julia/issues/23306#issuecomment-324708199

femtotrader commented 6 years ago

@wookay Iterators.jl has been deprecated in favour of IterTools.jl

I've open a issue about this feature request (which is nearly 2 years old) https://github.com/JuliaCollections/IterTools.jl/issues/11

dmolina commented 6 years ago

I was considering using Julia for scientific computation (in my design of evolutionary algorithms for optimization), and a so simple operation like that (that I use a lot in Numpy) is not implemented after more than two years, It's very discouraging. It has not to be included in Base, it is enough to be in a optional recommended package to use it, but I have not found it in any package until now. For people working a lot in two-dimensional matrixes like it, it is a real pain not to be able to iterate easily in them.

timholy commented 6 years ago

Why not create that missing package yourself? The fact that it hasn't been done means that it isn't very important to most people; much more important stuff gets done all the time, and indeed the rate of evolution in Julia is insane. Since this is clearly all-important to you, I'd say just do it!

cortner commented 6 years ago

https://github.com/bramtayl/JuliennedArrays.jl

Does this not cover this functionality?

timholy commented 6 years ago

Ah yes, I'd forgotten that it also had those iterators. It's a brilliant package, highly recommended.

AzamatB commented 6 years ago

Why not create that missing package yourself? The fact that it hasn't been done means that it isn't very important to most people

It is quite important, basic, common, and it has been implemented - there is a PR for this in IterTools.jl waiting to be merged, but I am not sure why package maintainers are reluctant to review and merge it.

dmolina commented 6 years ago

Thanks all for your responses. I am very grateful for them, and for the language itself. I like specially its flexibility through packages. I expect to write my own ones when I will have more experience in Julia. Thanks for the information about the mentioned packages.

axsk commented 6 years ago

I agree it is important, and quite basic/common. In my eyes I think it is too basic for putting it in its own package, which I have to count on being maintained. This is why I usually write for loops or other mentioned here, but I am annoyed every time it comes up. Of course you can write it yourself, but this also holds for many other functions in Base, (e.g. slice...) and I believe a scientific language needs to assist in such basic tasks...

Considering JuliennedArrays, this seems way more specific and complex. And IterTools seems to disagree that it fits there (in which point I agree as well).

I don't fully agree on slimming out Base and moving everything to different packages for a language targeting scientific computing (thats another discussion though). But imho this should at least be available in LinearAlgebra.

sdewaele commented 5 years ago

I fully agree with @axsk. It is just so much more readable to write something like:

xs = collect(columns(X))

compared to:

sx = [X[:,i] for i in 1:size(X,2)]

@StefanKarpinski has mentioned before that it adds only a bit of code. In my opinion, the difference between C and Julia to a large part is made up out of seemingly small conveniences like this, leading to a big difference when all put together.

ekinakyurek commented 5 years ago

I also would like to have columns and rows iterators in LinearAlgebra

StefanKarpinski commented 5 years ago

Having eachslice seems like a reasonable API; then eachrow and eachcol can be shorthands.

ekinakyurek commented 5 years ago

I believe eachcolumn is more readable but if there are another *col* functions in LinearAlgebra base you may want to prefer eachcol for consistency.

AzamatB commented 5 years ago

slices(), rows() and columns() is another alternative that I find to be more slick.

StefanKarpinski commented 5 years ago

The each prefix is quite standard to indicate that an iterator is returned instead of a collection. The verb slices indicates that what is returned is an eager representation of the collection of slices. That may seem nice to have but it forces allocation of all view objects for all slices whereas an iterator that yields one slice at a time can either reuse a single view object or, with some clever compiler tricks, determine when a view doesn't escape and eliminate the allocation that way.

arnavs commented 5 years ago

Opened a PR (#29749) which I since updated to reflect the naming conventions in this thread. So, eachrow(M::AbstractVecOrMat) and eachcol(M::AbstractVecOrMat) and a general eachslice(A::AbstractArray; dims) which return an iterator over views into the rows/columns of M.

mbauman commented 5 years ago

Closed by #29749. Thanks @arnavs!