Open HarrisonGrodin opened 5 years ago
Might there be ambiguities about which behavior to apply, eg. for a Tree
that iterates its children? Right now you could broadcast over a structure like that and get back an array, if there’s no custom BroadcastStyle
defined for it.
Would these new behaviors be up to the datatype to define, eg. Tree
would define a broadcast style under which the return value is a Tree
?
With nonlinear datatypes, we do lose the "just iterate" capability, especially because there's no general way to construct an instance of the functor. However, yes, developers should be expected to define a new style which includes the appropriate mapping operation along with the datatype. (Think Functor
instances in Haskell.)
I must confess I'm very confused. There seem to be three different features you'd like here:
Some
to behave like 0-dimensional containersnothing
to propagate like missing
in the context of broadcastI understand you're uniting them all as abstract functor maps, but in practice they each act and behave very differently. And the latter two are quite outside of the generic expectation of what dots currently mean. I think it'd be hard to generically use broadcast if each data structure was given this much freedom.
The dot syntax is very powerful — both for its concise syntax and its implementation, which holds a complete representation of the AST. But there's also a power in its unified meaning.
The example with option types (Some(x)
and nothing
) may not be as motivating, given that Julia doesn't have native sum types. I would still argue that encouraging broadcasts to recurse over inductive data structures seems rather important, given that all nontrivial data structures which don't use arrays are necessarily inductive. Even if we stay array-based, there are other data structures which could benefit from broadcasting; for example, sets¹, which aren't inductively defined:
julia> string.(abs.(Set([-1, 1, 2])))
Set(["1", "2"])
With respect to the expectation of what dots mean, that's part of what this issue aims to change. As a community, it seems advantageous to broaden our view of what broadcasting means from a strictly array-based view to a mentality which includes other data structures.
I agree, the dot syntax is extremely powerful. However, I disagree with the claim that it has a unified meaning. Currently, there is no formalism around what "broadcasting" means, and it's unclear which data structures should support this Julia-specific operation. If we take inspiration from category theory and its implementation in other languages, though, we could end up with a clear, unified, and precise description of which data structures support broadcasting: any parameterized type which satisfies the functor laws.
That example is issue https://github.com/JuliaLang/julia/issues/26359. In fact, I'd argue that map
could potentially have greater latitude to return a Set
here than broadcast does — and yet it was quite surprising when it did. At the same time, though, I'd like to see map
and broadcast
converge a bit more in the one-argument case.
I quite strongly disagree with the claim that broadcasting does not have a unified meaning. It applies functions elementwise, extruding singleton dimensions to match shapes between arguments as needed. It works on nearly all collections — including sets — by falling back to iteration. If you find its documentation lacking in formalism, let's try to address that.
Now, I agree there's a challenge in getting the "right" collection type back — and we can definitely work to improve that. I'm not sure Sets are the poster child for that case, though, given the above issue.
One issue here, I think, is that 1-argument broadcast
is pretty clearly equivalent to map
, in which case most of this applies and all sorts of data structures can/should support it. But it diverges for multiple arguments, where the definition refers to array dimensions. How should we generalize the notion of array dimensions so that multi-argument broadcast makes sense for other types?
Currently, there is no formalism around what "broadcasting" means
It's map
, except arrays are repeated along singleton dimensions. Ok, that's an English description, not a formal one, but surely it's not hard to imagine that written in some kind of notation.
we could end up with a clear, unified, and precise description of which data structures support broadcasting
I believe this is a rhetorical trick (and one you tend to see in the PL world) that conflates (1) what something does, and (2) whether there is a clear and precise description of it. Everybody wants things to be clear and precise, but those are general properties that might apply to lots of things. But the implication here is that only one, specific, preferred meaning can count as clear and precise.
With regards to the other points, I think it would be possible to define auto-lifting for Some and not-Some. It might only be defined and valid in certain cases (such as 1-arg, or not mixed with other iterables), but I think creating definitions such that f.(..., nothing, ...) === nothing
and f.(Some(x), Some(y), ...) === Some(f(x, y, ...))
seem possibly realistic.
Ah, I was unaware of this definition; this does seem sufficiently precise. I've been reading the docs on broadcasting, and looks like I missed it:
Additionally,
broadcast
is not limited to arrays (see the function documentation), it also handles tuples and treats any argument that is not an array, tuple orRef
(except forPtr
) as a "scalar".
I would still claim, though, that this is a rather restrictive definition, encouraging array-oriented programming where there seems to be room for additional flexibility in supported data structures.
Design-wise, it seems like there's a consensus that aside from scalars, broadcasting over different types doesn't make much sense (unless there's a "coercion", such as repeating along singleton dimensions). Then, given f.(x1, x2, ..., xn)
for each xi::F{T}
and each being of the same "shape", we should produce an identically-shaped structure where each element y
should be f(y1, y2, ..., yn)
where yi
is the element at the correct position in the i
th iterator. (We treat yi
as xi
if xi
is a scalar.)
In other words, f.(xs...)
should be "equivalent" to map(f, zip(xs...))
, for some reasonable definitions of zip
and map
.
At least from an API standpoint, I believe this design should support all existing array-based functionalities, while also being extensible to additional datatypes (trees, options, etc).
Gah, that section of the documentation that you quoted is woefully out of date. We no longer default to "scalar" and are generally much more accepting of broadcasting over other data structures.
it seems like there's a consensus that aside from scalars, broadcasting over different types doesn't make much sense
I don't see such a consensus at all. We have an entire system dedicated to supporting broadcast over heterogeneous arguments! We have an open issue talking about how broadcast might be extended to arbitrary key-value collections (like dictionaries) and how that might interoperate with mixtures of arrays (#25904).
In other words,
f.(xs...)
should be "equivalent" tomap(f, zip(xs...))
, for some reasonable definitions ofzip
andmap
.
I 100% agree. The very tricky part is reducing discontinuities (both in behavior and implementation) when you swap out one of your homogenous xs
for a scalar or something else.
We have an entire system dedicated to supporting broadcast over heterogeneous arguments! We have an open issue talking about how broadcast might be extended to arbitrary key-value collections (like dictionaries) and how that might interoperate with mixtures of arrays (#25904).
Although this debate could very easily descend into subjective opinion, I'd like to make a brief argument against this style. Supporting heterogeneous arguments (aside from isomorphism/coercion) seems like it adds significant complexity, to the point that it becomes unclear which pairwise combinations of types can be broadcasted together and unclear what each pair does. Additionally, when arguments become non-scalar and non-isomorphic, the inherent meaning of broadcasting seems to get lost.
The very tricky part is reducing discontinuities (both in behavior and implementation) when you swap out one of your homogenous
xs
for a scalar or something else.
Given a scalar, I believe the zip
operation should be rather straightforward; if xi
is a scalar, make the i
th element of each resulting tuple be the entirety of xi
. For example, the following behavior would be expected:
julia> zip(Node(Leaf(2),Node(Leaf(3),Leaf(4))), "r")
Node(Leaf((2,"r")),Node(Leaf((3,"r")),Leaf((4,"r"))))
Otherwise, swapping an element of xs
for a value of another type seems like it could become almost nonsensical in the general case, to the point that it should just be written explicitly by the user or library, if for nothing but clarity. (For example, what does zipping binary trees with ternary trees mean, or trees with arrays, or arrays with options?) In the case that the heterogeneity is clearly logical, such as when the types are isomorphic (e.g. arrays and static arrays, or two separate implementations of binary trees), we could use the promotion mechanism to decide which container the result ends up in.
Overall, I rather like the meaning and design of broadcasting, and I think it's an incredibly powerful and elegant way to perform elementwise operations. However, I believe that it could become even more useful if we allowed the primary design to break away from array-like types to arbitrary containers, generalizing what it means to be an "element".
Can zip
just re-use the broadcast machinery?
In theory, I see nothing that prevents a package from experimenting with this. Say, abstract type Functor
, and then change what broadcasting of functors does. The only limits are behavior from Base (don't be a pirate!) and the current lowering.
So, are there some dispatches in Base that create ambiguity errors for you? We can maybe fix that. Are there simple things in the lowering that would need to work differently? It is unlikely that this can change, but maybe we can find a workaround.
But is there any example of some new code that should live in Base instead of a package?
it becomes unclear which pairwise combinations of types can be broadcasted together and unclear what each pair does. Additionally, when arguments become non-scalar and non-isomorphic, the inherent meaning of broadcasting seems to get lost.
With our current system, that's not at all accurate. Which pairwise combinations of types can be broadcasted together? All of them. Seriously. If a type works in broadcast by itself, it'll work with any other broadcastable thing. It's not ambiguous at all. Is it unclear what it does? Not at all. It applies the function over the arguments elementwise, repeating singleton dimensions as necessary to match shapes, and returns a new container with the elementwise results. Now, the part that is unclear is which container type it might return — but it's going to be something that has axes
and supports indexing and contains the elementwise results you'd expect in the locations you'd expect.
Now, were we to extend the meaning of dots to abstract functor maps, then yes, your above statement would be true. And of course the part of the sentence that I elided in the above quote — that supporting heterogeneous arguments adds complexity — is also very true. But it's also a very big feature. Folks definitely use heterogeneous array arguments quite frequently, with all sorts of combinations of Array
, StaticArray
, Structured and sparse matrices. Admittedly it's not as common to combine arrays with non-scalar non-arrays, but I've definitely used tuples with arrays and have seen others do so as well.
@mbauman This is what I meant by "aside from isomorphism/coercion". I agree that supporting isomorphic heterogeneous arguments is very useful and reasonable, such as broadcasting over Array
and StaticArray
together (or variadic trees and Expr
s, dictionaries and sets, etc). I'm not suggesting that this behavior should be changed for data structures with the same shape. My only claim here is that given full functor-inspired broadcasting, combining unrelated types (such as arrays and trees, arrays and dictionaries, etc.) would get incomprehensible rather quickly.
Now, the part that is unclear is which container type it might return — but it's going to be something that has
axes
and supports indexing and contains the elementwise results you'd expect in the locations you'd expect.
I really like this, and the goal of this issue would be to extend it. The idea that broadcasting over array-like types produces a value of some type which is guaranteed to support a certain interface is powerful, and I often find myself wishing this was possible for non-array types, as well.
This thread makes me think if there is a way to extend the notion of broadcasting to container with more complex "shape" than arrays. Also, can we make such extension useful even for arrays?
First of all, by broadcasting, I mean "stretching singleton dimension to fit with other arrays". (I'm clarifying it here since I have an impression that @HarrisonGrodin is using "broadcasting" to only mean fused mapping and not in the sense of Base.broadcast
. Julia's dot-call does both. So I thought noting it here makes sense.) If the request here is to use the dot-call syntax to do fmap
, I think it has lower chance to be considered as the dot-call syntax does much more (namely, broadcasting) and they are all useful.
The idea I have is to represent "structural absence" by nothing
(as opposed to "numerical/data absence" which is represented by missing
). Currently we can't apply .+
to vectors of different sizes:
julia> xs = [1, 2];
ys = [10, 20, 30];
julia> xs .+ ys
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
...
But I think we can formalize the (extended) broadcasting in such a way that an array zs = [11, 22, nothing]
is generated as an intermediate step (conceptually) which is then fed to something.(zs)
. This gives us two new variants of broadcasting: one fills missing elements with nothing
and one strip off nothing
. So, how about making it possible to invoke each of them by wrapping dot calls with special "functions" like this? (Maybe there are much better names for those functions [1].)
julia> expand.(xs .+ ys)
3-element Array{Union{Nothing, Int64},1}:
11
22
nothing
julia> drop.(xs .+ ys)
2-element Array{Int64,1}:
11
22
(I think they are implementable by specializing Broadcast.broadcasted(::typeof(expand), bc)
etc.)
Both variants would work nicely with trees with different shape:
tree1:
4
/ \
3 5
/ \
1 2
tree2:
40
/ \
30 50
/ \
60 70
expand.(tree1 .+ tree2):
44
/ \
33 55
/ \ / \
n n n n (where n = nothing)
drop.(tree1 .+ tree2):
44
/ \
33 55
As Union{Some{T}, Nothing}
is conceptually a container of one optional element, the example in the OP requires expand
julia> expand.(string.(float.(Some(3))))
Some("3.0")
julia> expand.(string.(float.(nothing)))
nothing
This feature would be useful for arrays as well because we can build a complex filter using it:
onlypositive(x) = x > 0 ? x : nothing
drop.(onlypositive.(xs .- mean(xs)) .+ onlypositive.(ys .- mean(ys)))
OK. The last example may seem to be better be done with missing
rather than nothing
. I'm not entirely sure if using nothing
this way makes sense. But the broadcasting can be optimized to short-circuit and avoid evaluating unnecessary functions if nothing
can have a special meaning in drop.(...)
. Also, something.(xs, default)
can be a useful idiom for padding containers with this approach.
Anyway, a minimal comment I want to make is that it can be done in external packages without new syntax (or even macros) because Broadcast.broadcasted
is super powerful.
[1] I think union
or join
is better than expand
but both of them are taken. Maybe pad
instead of expand
? Like wise, intersect
is better than drop
but it's also taken. Using meet
kind of make sense (as a synonym of intersect
) but I'm not sure.
Currently, broadcasting is heavily specialized to work with array-like (i.e. linear, indexable, and finite) data structures. However, the notion of mapping a function over a parameterized datatype
F{T}
applies to many data structures, such as options (Some(3)
vsnothing
) and trees. In fact, this operation is exactly "functor map", which applies over all functor datatypes.Functors require that for a given type and its associated "mapping" function (often referred to as
fmap
or<$>
), the following laws hold (using Haskell notation):In other words, a functor must (a) ensure that mapping the identity function has no effect, and (b) ensure that mapping the function composition
f ∘ g
is the same as first mappingg
and then mappingf
.From this mathematical definition, we have derived the "broadcasting" functor map operation
f.(x)
withx::F{T}
for some typeF
which forms a functor. (Note that the second functor law guarantees that "fusion" is valid!)This issue proposes that broadcasting be extended to work more seamlessly with other (i.e. non-linear and non-indexable) datatypes, including the fusion feature.
There are a few edge cases to consider, such as combining multiple different functor type in a single broadcast call (e.g.
f.([1,2,3], Tree(...))
). Additionally, a slight redesign of broadcasting may be necessary, due to its design focus on axes.