Broadcasting as functor map

JuliaLang / julia

The Julia Programming Language

https://julialang.org/

MIT License

45.88k stars 5.49k forks source link

Broadcasting as functor map #32081

Open HarrisonGrodin opened 5 years ago

HarrisonGrodin commented 5 years ago

Currently, broadcasting is heavily specialized to work with array-like (i.e. linear, indexable, and finite) data structures. However, the notion of mapping a function over a parameterized datatype F{T} applies to many data structures, such as options (Some(3) vs nothing) and trees. In fact, this operation is exactly "functor map", which applies over all functor datatypes.

Functors require that for a given type and its associated "mapping" function (often referred to as fmap or <$>), the following laws hold (using Haskell notation):

fmap id = id
fmap (f . g)  ==  fmap f . fmap g

In other words, a functor must (a) ensure that mapping the identity function has no effect, and (b) ensure that mapping the function composition f ∘ g is the same as first mapping g and then mapping f.

From this mathematical definition, we have derived the "broadcasting" functor map operation f.(x) with x::F{T} for some type F which forms a functor. (Note that the second functor law guarantees that "fusion" is valid!)

This issue proposes that broadcasting be extended to work more seamlessly with other (i.e. non-linear and non-indexable) datatypes, including the fusion feature.

julia> string.(float.(Some(3)))
Some("3.0")

julia> string.(float.(nothing))
nothing

julia> struct Node{T}
           l::Union{Nothing,Node{T}}
           x::T
           r::Union{Nothing,Node{T}}
       end
       const Tree{T} = Union{Nothing,Node{T}}
Union{Nothing, Node{T}} where T

julia> string.(float.(Tree(nothing, 3, Tree(nothing, 4, nothing)))
Tree{String}(nothing, "3.0", Tree{String}(nothing, "4.0", nothing))

There are a few edge cases to consider, such as combining multiple different functor type in a single broadcast call (e.g. f.([1,2,3], Tree(...))). Additionally, a slight redesign of broadcasting may be necessary, due to its design focus on axes.

yurivish commented 5 years ago

Might there be ambiguities about which behavior to apply, eg. for a Tree that iterates its children? Right now you could broadcast over a structure like that and get back an array, if there’s no custom BroadcastStyle defined for it.

Would these new behaviors be up to the datatype to define, eg. Tree would define a broadcast style under which the return value is a Tree?

HarrisonGrodin commented 5 years ago

With nonlinear datatypes, we do lose the "just iterate" capability, especially because there's no general way to construct an instance of the functor. However, yes, developers should be expected to define a new style which includes the appropriate mapping operation along with the datatype. (Think Functor instances in Haskell.)

mbauman commented 5 years ago

I must confess I'm very confused. There seem to be three different features you'd like here:

Some to behave like 0-dimensional containers
nothing to propagate like missing in the context of broadcast
Recursing through nested data structures

I understand you're uniting them all as abstract functor maps, but in practice they each act and behave very differently. And the latter two are quite outside of the generic expectation of what dots currently mean. I think it'd be hard to generically use broadcast if each data structure was given this much freedom.

The dot syntax is very powerful — both for its concise syntax and its implementation, which holds a complete representation of the AST. But there's also a power in its unified meaning.

HarrisonGrodin commented 5 years ago

The example with option types (Some(x) and nothing) may not be as motivating, given that Julia doesn't have native sum types. I would still argue that encouraging broadcasts to recurse over inductive data structures seems rather important, given that all nontrivial data structures which don't use arrays are necessarily inductive. Even if we stay array-based, there are other data structures which could benefit from broadcasting; for example, sets¹, which aren't inductively defined:

julia> string.(abs.(Set([-1, 1, 2])))
Set(["1", "2"])

With respect to the expectation of what dots mean, that's part of what this issue aims to change. As a community, it seems advantageous to broaden our view of what broadcasting means from a strictly array-based view to a mentality which includes other data structures.

I agree, the dot syntax is extremely powerful. However, I disagree with the claim that it has a unified meaning. Currently, there is no formalism around what "broadcasting" means, and it's unclear which data structures should support this Julia-specific operation. If we take inspiration from category theory and its implementation in other languages, though, we could end up with a clear, unified, and precise description of which data structures support broadcasting: any parameterized type which satisfies the functor laws.

Note that if sets were implemented inductively (i.e. without arrays), broadcasting fusion would induce an asymptotic speedup!

mbauman commented 5 years ago

That example is issue https://github.com/JuliaLang/julia/issues/26359. In fact, I'd argue that map could potentially have greater latitude to return a Set here than broadcast does — and yet it was quite surprising when it did. At the same time, though, I'd like to see map and broadcast converge a bit more in the one-argument case.

I quite strongly disagree with the claim that broadcasting does not have a unified meaning. It applies functions elementwise, extruding singleton dimensions to match shapes between arguments as needed. It works on nearly all collections — including sets — by falling back to iteration. If you find its documentation lacking in formalism, let's try to address that.

Now, I agree there's a challenge in getting the "right" collection type back — and we can definitely work to improve that. I'm not sure Sets are the poster child for that case, though, given the above issue.

JeffBezanson commented 5 years ago

One issue here, I think, is that 1-argument broadcast is pretty clearly equivalent to map, in which case most of this applies and all sorts of data structures can/should support it. But it diverges for multiple arguments, where the definition refers to array dimensions. How should we generalize the notion of array dimensions so that multi-argument broadcast makes sense for other types?

Currently, there is no formalism around what "broadcasting" means

It's map, except arrays are repeated along singleton dimensions. Ok, that's an English description, not a formal one, but surely it's not hard to imagine that written in some kind of notation.

we could end up with a clear, unified, and precise description of which data structures support broadcasting

I believe this is a rhetorical trick (and one you tend to see in the PL world) that conflates (1) what something does, and (2) whether there is a clear and precise description of it. Everybody wants things to be clear and precise, but those are general properties that might apply to lots of things. But the implication here is that only one, specific, preferred meaning can count as clear and precise.

vtjnash commented 5 years ago

With regards to the other points, I think it would be possible to define auto-lifting for Some and not-Some. It might only be defined and valid in certain cases (such as 1-arg, or not mixed with other iterables), but I think creating definitions such that f.(..., nothing, ...) === nothing and f.(Some(x), Some(y), ...) === Some(f(x, y, ...)) seem possibly realistic.

HarrisonGrodin commented 5 years ago

Ah, I was unaware of this definition; this does seem sufficiently precise. I've been reading the docs on broadcasting, and looks like I missed it:

Additionally, broadcast is not limited to arrays (see the function documentation), it also handles tuples and treats any argument that is not an array, tuple or Ref (except for Ptr) as a "scalar".

I would still claim, though, that this is a rather restrictive definition, encouraging array-oriented programming where there seems to be room for additional flexibility in supported data structures.

Design-wise, it seems like there's a consensus that aside from scalars, broadcasting over different types doesn't make much sense (unless there's a "coercion", such as repeating along singleton dimensions). Then, given f.(x1, x2, ..., xn) for each xi::F{T} and each being of the same "shape", we should produce an identically-shaped structure where each element y should be f(y1, y2, ..., yn) where yi is the element at the correct position in the ith iterator. (We treat yi as xi if xi is a scalar.)

In other words, f.(xs...) should be "equivalent" to map(f, zip(xs...)), for some reasonable definitions of zip and map.

At least from an API standpoint, I believe this design should support all existing array-based functionalities, while also being extensible to additional datatypes (trees, options, etc).

mbauman commented 5 years ago

Gah, that section of the documentation that you quoted is woefully out of date. We no longer default to "scalar" and are generally much more accepting of broadcasting over other data structures.

it seems like there's a consensus that aside from scalars, broadcasting over different types doesn't make much sense

I don't see such a consensus at all. We have an entire system dedicated to supporting broadcast over heterogeneous arguments! We have an open issue talking about how broadcast might be extended to arbitrary key-value collections (like dictionaries) and how that might interoperate with mixtures of arrays (#25904).

In other words, f.(xs...) should be "equivalent" to map(f, zip(xs...)), for some reasonable definitions of zip and map.

I 100% agree. The very tricky part is reducing discontinuities (both in behavior and implementation) when you swap out one of your homogenous xs for a scalar or something else.

HarrisonGrodin commented 5 years ago

We have an entire system dedicated to supporting broadcast over heterogeneous arguments! We have an open issue talking about how broadcast might be extended to arbitrary key-value collections (like dictionaries) and how that might interoperate with mixtures of arrays (#25904).

Although this debate could very easily descend into subjective opinion, I'd like to make a brief argument against this style. Supporting heterogeneous arguments (aside from isomorphism/coercion) seems like it adds significant complexity, to the point that it becomes unclear which pairwise combinations of types can be broadcasted together and unclear what each pair does. Additionally, when arguments become non-scalar and non-isomorphic, the inherent meaning of broadcasting seems to get lost.

The very tricky part is reducing discontinuities (both in behavior and implementation) when you swap out one of your homogenous xs for a scalar or something else.

Given a scalar, I believe the zip operation should be rather straightforward; if xi is a scalar, make the ith element of each resulting tuple be the entirety of xi. For example, the following behavior would be expected:

julia> zip(Node(Leaf(2),Node(Leaf(3),Leaf(4))), "r")
Node(Leaf((2,"r")),Node(Leaf((3,"r")),Leaf((4,"r"))))

Otherwise, swapping an element of xs for a value of another type seems like it could become almost nonsensical in the general case, to the point that it should just be written explicitly by the user or library, if for nothing but clarity. (For example, what does zipping binary trees with ternary trees mean, or trees with arrays, or arrays with options?) In the case that the heterogeneity is clearly logical, such as when the types are isomorphic (e.g. arrays and static arrays, or two separate implementations of binary trees), we could use the promotion mechanism to decide which container the result ends up in.

Overall, I rather like the meaning and design of broadcasting, and I think it's an incredibly powerful and elegant way to perform elementwise operations. However, I believe that it could become even more useful if we allowed the primary design to break away from array-like types to arbitrary containers, generalizing what it means to be an "element".

bramtayl commented 5 years ago

Can zip just re-use the broadcast machinery?

chethega commented 5 years ago

In theory, I see nothing that prevents a package from experimenting with this. Say, abstract type Functor, and then change what broadcasting of functors does. The only limits are behavior from Base (don't be a pirate!) and the current lowering.

So, are there some dispatches in Base that create ambiguity errors for you? We can maybe fix that. Are there simple things in the lowering that would need to work differently? It is unlikely that this can change, but maybe we can find a workaround.

But is there any example of some new code that should live in Base instead of a package?

mbauman commented 5 years ago

it becomes unclear which pairwise combinations of types can be broadcasted together and unclear what each pair does. Additionally, when arguments become non-scalar and non-isomorphic, the inherent meaning of broadcasting seems to get lost.

With our current system, that's not at all accurate. Which pairwise combinations of types can be broadcasted together? All of them. Seriously. If a type works in broadcast by itself, it'll work with any other broadcastable thing. It's not ambiguous at all. Is it unclear what it does? Not at all. It applies the function over the arguments elementwise, repeating singleton dimensions as necessary to match shapes, and returns a new container with the elementwise results. Now, the part that is unclear is which container type it might return — but it's going to be something that has axes and supports indexing and contains the elementwise results you'd expect in the locations you'd expect.

Now, were we to extend the meaning of dots to abstract functor maps, then yes, your above statement would be true. And of course the part of the sentence that I elided in the above quote — that supporting heterogeneous arguments adds complexity — is also very true. But it's also a very big feature. Folks definitely use heterogeneous array arguments quite frequently, with all sorts of combinations of Array, StaticArray, Structured and sparse matrices. Admittedly it's not as common to combine arrays with non-scalar non-arrays, but I've definitely used tuples with arrays and have seen others do so as well.

HarrisonGrodin commented 5 years ago

@mbauman This is what I meant by "aside from isomorphism/coercion". I agree that supporting isomorphic heterogeneous arguments is very useful and reasonable, such as broadcasting over Array and StaticArray together (or variadic trees and Exprs, dictionaries and sets, etc). I'm not suggesting that this behavior should be changed for data structures with the same shape. My only claim here is that given full functor-inspired broadcasting, combining unrelated types (such as arrays and trees, arrays and dictionaries, etc.) would get incomprehensible rather quickly.

Now, the part that is unclear is which container type it might return — but it's going to be something that has axes and supports indexing and contains the elementwise results you'd expect in the locations you'd expect.

I really like this, and the goal of this issue would be to extend it. The idea that broadcasting over array-like types produces a value of some type which is guaranteed to support a certain interface is powerful, and I often find myself wishing this was possible for non-array types, as well.

tkf commented 5 years ago

This thread makes me think if there is a way to extend the notion of broadcasting to container with more complex "shape" than arrays. Also, can we make such extension useful even for arrays?

First of all, by broadcasting, I mean "stretching singleton dimension to fit with other arrays". (I'm clarifying it here since I have an impression that @HarrisonGrodin is using "broadcasting" to only mean fused mapping and not in the sense of Base.broadcast. Julia's dot-call does both. So I thought noting it here makes sense.) If the request here is to use the dot-call syntax to do fmap, I think it has lower chance to be considered as the dot-call syntax does much more (namely, broadcasting) and they are all useful.

The idea I have is to represent "structural absence" by nothing (as opposed to "numerical/data absence" which is represented by missing). Currently we can't apply .+ to vectors of different sizes:

julia> xs = [1, 2];
       ys = [10, 20, 30];

julia> xs .+ ys
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
...

But I think we can formalize the (extended) broadcasting in such a way that an array zs = [11, 22, nothing] is generated as an intermediate step (conceptually) which is then fed to something.(zs). This gives us two new variants of broadcasting: one fills missing elements with nothing and one strip off nothing. So, how about making it possible to invoke each of them by wrapping dot calls with special "functions" like this? (Maybe there are much better names for those functions [1].)

julia> expand.(xs .+ ys)
3-element Array{Union{Nothing, Int64},1}:
 11
 22
   nothing

julia> drop.(xs .+ ys)
2-element Array{Int64,1}:
 11
 22

(I think they are implementable by specializing Broadcast.broadcasted(::typeof(expand), bc) etc.)

Both variants would work nicely with trees with different shape:

tree1:
     4
   /   \
  3    5
/   \
1   2

tree2:
     40
   /   \
  30    50
       /   \
       60   70

expand.(tree1 .+ tree2):
     44
   /   \
  33    55
/   \  /   \
n    n n    n   (where n = nothing)

drop.(tree1 .+ tree2):
     44
   /   \
  33    55

As Union{Some{T}, Nothing} is conceptually a container of one optional element, the example in the OP requires expand

julia> expand.(string.(float.(Some(3))))
Some("3.0")

julia> expand.(string.(float.(nothing)))
nothing

This feature would be useful for arrays as well because we can build a complex filter using it:

onlypositive(x) = x > 0 ? x : nothing
drop.(onlypositive.(xs .- mean(xs)) .+ onlypositive.(ys .- mean(ys)))

OK. The last example may seem to be better be done with missing rather than nothing. I'm not entirely sure if using nothing this way makes sense. But the broadcasting can be optimized to short-circuit and avoid evaluating unnecessary functions if nothing can have a special meaning in drop.(...). Also, something.(xs, default) can be a useful idiom for padding containers with this approach.

Anyway, a minimal comment I want to make is that it can be done in external packages without new syntax (or even macros) because Broadcast.broadcasted is super powerful.

[1] I think union or join is better than expand but both of them are taken. Maybe pad instead of expand? Like wise, intersect is better than drop but it's also taken. Using meet kind of make sense (as a synonym of intersect) but I'm not sure.