Closed StefanKarpinski closed 7 years ago
The need to support scalars in broadcast
makes me second-guess the whole "numbers should not be iterable" argument.
I don't think broadcast actually uses iteration. I don't see any broadcast problems in the log Tony posted. It almost certainly uses size and indexing, though.
It's a really fine line to walk. It definitely feels right to remove collection-like behavior from numbers, but we also often want numbers to behave like Array{T, 0}
— which is a collection.
Would it make sense to have something like an immutable ImmutableSingleton{T} <:AbstractArray{T,0} x::T end
(with all the necessary methods implemented) around? Everywhere a number should behave like Array{T, 0}
(except being immutable) it could then be wrapped with IIUC zero runtime overhead.
@martinholters, then to do atan2.(x,0.3)
you would need atan.(x, ImmutableSingleton(0.3))
? This seems crazy.
The whole point of having numbers act like Array{T,0}
is to enable generic code. If you force people to explicitly convert to another type, you lose that benefit.
I think the point is that the broadcast implementation would wrap the singletons for you. Not sure how helpful it would be, but that's how I interpreted @martinholters' suggestion.
@StefanKarpinski I was just trying phrase exactly that, but you were faster.
The point would be that number would not be iterable by default, and functions that benefit from iterable numbers have to take care of that themselves, instead of the opposite. I'm doubtful about the usefulness myself, I just wanted to point out the option.
One part of this seems to be easier to settle: to consider removing both Array{T,0}-like and collection-like properties of Char
separately. Following @mbauman that one is not a fine line to walk, because anyway Char is not a Number type anymore and, well, why should Char
s behave like Array{T,0}
s at all?
(Note that broadcast
no longer requires size
etc. to work for numbers.)
As I wrote on the mailing list, I suspect that a lot of the need for iterable/indexable numbers should be gone now with 0.5's dot-call syntax. In the cases where you would previously have written a generic vector/scalar function, you should now just write the scalar function f(x)
, and then apply it to arrays A
with f.(A)
. This is not only easier, it is also faster because it can fuse with other elementwise operations and the result can be assigned in-place with .=
.
It's instructive to try to patch Base to make numbers non-iterable. I'm finding various cases where removing iterability requires much uglier code. For example:
In split
, it calls r = search(string, splitter)
. If splitter
is a string, this returns a range, but if splitter
is a char, it returns an integer. Being able to call first(r)
and last(r)
in both cases makes the same code work for both.
In the code generation for multidimensional array indexing, it calls _nloops
to generate nested loops over the indices in expressions like a[i, 3:4]
. By being able to do for j in i
, the same generated code can handle i::Int
and i::AbstractVector{Int}
. (On the other hand, this may be suboptimal, since it doesn't look at first glance like LLVM can eliminate the loop in the Int
case.)
In the FFT code, you can pass any iterable of dimensions to be transformed. This allows you to pass a single dimension (integer) and have it be handled with the same code.
On the other hand, making numbers non-indexable (removing size
, getindex
, etcetera), seems much less disruptive ... it looks like almost no changes are required in Base.
The converse argument: if it is so useful to make numbers iterable, maybe everything should be iterable? i.e. just define fallback start
etc. methods for Any
.
In split, it calls r = search(string, splitter). If splitter is a string, this returns a range, but if splitter is a char, it returns an integer. Being able to call first(r) and last(r) in both cases makes the same code work for both.
This would be fixed by https://github.com/JuliaLang/julia/issues/10593 (see this Julep): you'd call findseq
or searchseq
, which would return an index range for both string and char arguments. The previous behavior returning a single index when passing a char would be obtained via findeq
/searcheq
(which wouldn't work to find a substring). So one less reason not to do this!
And note that we could use #19730 to wrap all numbers in a specialized AbstractArray{T,0}
before they're used in non-scalar indexing within to_indices
. That'd give them iteration, indexing, and shape without much hand-wringing… and that could actually remove a few methods. I'm not sure if there'd be a performance impact, however, and I'd like to keep that patch as conservative as possible for now. It's already pretty big.
I increasingly think we're not going to do this. We could make a lint warning that pesters you if you write for i = x
where x
is not a range expression. That would catch the cases where someone meant to write for i = 1:n
and accidentally wrote for i = n
instead. We could even go so far as to make that a syntax error, but that seems too draconian.
write for i = 1:n and accidentally wrote for i = n instead
I often get the phenomena many times. Since it does not raise error e.g. syntax error, it is hard to find
for i = n
should have wrote for i in 1:n
🐛
Yes, that was one of the motivations cited in this issue when it was opened.
I've just posted my question at Julia discourse (that is why i mentioned a comment at this issue).
It was not clear for me why number e.g. 10
is iterable.
But, by reading a discussion here https://github.com/JuliaLang/julia/pull/19700#issue-99274797. I've found it is difficult choice to decide.
Was the conclusion that this definitely isn't happening even for a Julia 2.0? I've seen several complaints/confusions about it in various places over the past few months.
It's gotta be rather convincing. We tried removing both iterability and indexability pre-1.0, but:
Removing iterability: I'm finding various cases where removing iterability requires much uglier code.
and
Removing indexability: Con: eliminating this functionality doesn't actually save us much code in Base, and it might make some kinds of generic functions more annoying to write, especially since numbers are still iterable. Is it worth it?
Some of these things have indeed changed, so it's certainly possible that the balance has shifted... but has it shifted enough? I'd bet not. It's quite a bit of churn.
I'm generally a supporter of iterability of numbers, but I have seen people get bit by it. To play devil's advocate, would it be so bad to change for d in dims
to for d in iterable(dims)
with
iterable(x) = x
iterable(x::Number) = (x,)
?
... or just toss whatever wrapper we end up using for broadcasting on a number and iterate that
Right, that's what makes this different — that iterable
function is essentially a narrower form of broadcastable
. We now have an entire architecture built up for this sort of thing.
My experience in trying to implement even a small piece of this pre-1.0 (#19700) leads me to believe that changing this would lead to a huge amount of code churn over the whole ecosystem. i.e. it wouldn't be worth it without huge benefits, which I haven't seen anyone articulate beyond "slightly confusing to some newcomers".
One admittedly minor problem is that I cannot use Julia to teach discrete mathematics as this goes against what I teach my students:
julia> Set([3]) ⊆ 3
true
@StephenVavasis has pointed out some rather confusing behavior of the
in
operator, including:Worse still is this:
This issue is to discuss what, if anything, we can do to reduce some of this confusion.