Open tkf opened 5 years ago
I’m in favour of adding a SetStyle
for broadcasting sets as set definition, for the reason it’s useful.
As a counterpoint, we sort of already have this with adjoints in the sense if x
and y
are vectors then
z in f.(x,y’)
if and only if z = f(x[k],y[j])
.
As a counterpoint, we sort of already have this with adjoints
I would still treat this a join combined with a special treatment (= broadcasting) of a singleton axis. I added "(and special handling for singleton axes)" in the OP to clarify my point.
If we were to use set arithmetic for broadcasting with sets, I'd argue we should forbid mixing it with arrays. It may be tempting to "promote" arrays or any iterators as sets. However, it would produce different results depending on how you associate operators. For example, in
x :: Set
y :: Vector
z :: Vector
a = x .+ y .+ z # equivalent to (x .+ y) .+ z
b = x .+ identity(y .+ z)
a
and b
would have different results. This is very odd given that +
is associative.
I would propose the following:
f.(x::Set)
should return a Set
, so the following is wrong:
julia> x = Set([1,2,3])
Set([2, 3, 1])
julia> exp.(x) # Should return Set([exp(2),exp(3),exp(1)])
3-element Array{Float64,1}:
7.38905609893065
20.085536923187668
2.718281828459045
2. `f.(x::Set, 2)` should also return a `Set`
3. `f.(x::Set, y::Set)`, `f.(x::Set, y::Vector)` etc. should probably just error out
So the rough implementation would be:
```julia
struct SetStyle <: BroadcastStyle end
BroadcastStyle(::Type{<:AbstractSet}) = SetStyle()
BroadcastStyle(::SetStyle, ::DefaultArrayStyle{0}) = SetStyle()
BroadcastStyle(::SetStyle, ::SetStyle) =error("Cannot broadcast sets together")
BroadcastStyle(::SetStyle, ::AbstractArrayStyle) = error("Cannot broadcast sets and arrays")
copy(b::Broadcast{SetStyle}) = Set(broadcast(b.f, collect.(b.args)))
If we really want to argue that f.(x::Set, y::Set)
should be defined, we could say that a set behaves like a singleton/scalar and by the definition of broadcast
Singleton and missing dimensions are expanded to match the extents of the other arguments by virtually repeating the value
which would essentially be equivalent to set functions. But I think that may be pushing things....
See also previous attempt at making this an error at https://github.com/JuliaLang/julia/pull/28491. Not sure why it wasn't merged at the time, since map
was deprecated (https://github.com/JuliaLang/julia/pull/26980).
Cc: @mbauman
@nalimilan Thank you. Reviving #28491 seems to be the easiest short-term option.
The decision to deprecate map
seems odd to me, the discussion in https://github.com/JuliaLang/julia/issues/26359 didn't really clarify the problem.
Based on those issues and PRs however the current behaviour is desired. Unfortunately its not very compatible with uncountable sets a la IntervalSets.jl , and its not clear what the analogue of map
or broadcast
should be....
we could say that a set behaves like a singleton/scalar and by the definition of broadcast
@dlfivefifty I'm thinking that we don't have to follow the current definition strictly if there is a way to generalize the definition to have useful broadcasting with sets. Taking product is certainly useful and probably OK provided that mixing sets with arrays and dicts is an error but I'd be interested in what others think about this.
I suppose the issue is that there is a desire for broadcast(f, a, ...)
to return the same thing as broadcast(f, collect(a), ...)
. With set products that wouldn't be true. But for other cases I would say it is true, for a certain point of view: e.g., x in broadcast(f, Set([1,2,3]), ...)
if and only if x in broadcast(f, [1,2,3], ...)
so the two "sets" are the same.
iteration order of Set which is not meaningful
It's not defined, but it is meaningful, and so it's not "broken". For example, here's a fairly useless one-liner for doing random things to the values in a dictionary by iterating 3 sets:
julia> D = Dict(1 => 2, 3 => 4, 5 => 6);
julia> Dict(keys(D) .=> tuple.(values(D), values(D) .^ 2))
Dict{Int64,Tuple{Int64,Int64}} with 3 entries:
3 => (4, 16)
5 => (6, 36)
1 => (2, 4)
This also doesn't consider OrderedSet, where the order is defined.
not sure why it wasn't merged at the time, since map was deprecated
map
was deprecated because it was doing a special case to make it return a Set which gave surprising results due to the failure to preserve order and count (#26359) and meant associativity was broken for Set. broadcast
has currently generally fallen back to iteration (c.f. decision on Ref usage vs. scalarization. Emphasis on currently, since there are open issues proposing to use indexing more heavily.)
f.(x::Set)
should return a Set
This is what map
used to do, which is why that method was deprecated and not the other.
@vtjnash I think your observation is that current broadcasting definition is useful when the order has some meaning as in the case of KeySet
? My main concern is when there is no meaning in the iteration order, as in Set
.
I agree with @tkf. Broadcasting should only work between ordered collections (including KeySet
), where a Set
is unordered.
More broadly, it might be good to sort out what AbstractSet
means. It seems that people are taking sets to be collections which store only one entry, but I would disagree and think the defining feature is implementing in
. So I'd make the following definitions:
in
. This includes Set
, Interval
, and AbstractArray
Count
, Set
, and AbstractArray
(including infinite arrays) but not Interval
.Set
but includes KeySet
, AbstractArray
, and Count
.getindex
. This includes AbstractArray
, Broadcasted
, but not KeySet
.axes
. This includes AbstractArray
and probably Broadcasted
.So broadcasting is well-defined for shaped sets. If we assume unshaped ordered sets default to "vector-like", then we can extend the definition for any ordered sets. This leaves sets which are not ordered (Interval
and Set
). This should probably be an error.
Edit: forgot a key one:
\6. A uniquely enumerable set is an enumerable set where each element appears exactly once. This includes Set
, KeySet
, Count
, and Range
, but not Array
.
I think definition 6 is closer to sets in math although they doesn't have to be enumerable. Definition 1 seems to be closer to what is called multiset or bag.
Definition 4 seems to be equivalent to what it's called associative data structure. Arraylike
suggested in #32940 seems to cover definition 5.
But I guess you are trying to formalize the capabilities of containers? By maybe using traits? For example does adding Uncountable <: IteratorSize
help formalizing definition 2? Adding traits for ordering and indexability seems useful too. The end result would be to error out in broadcastable
if IteratorSize
does not return HasLength
or HasShape
. (However, using infinite size iterator like Iterators.take((@lazy 2 .^ iter_all_primes() .- 1), 10)
can be useful if we have non-materializing broadcast #19198)
This leaves sets which are not ordered (
Interval
andSet
). This should probably be an error.
I think I agree. But as a brainstorming, I think one possibility is to use:
But this probably is "too dynamic" for people reading the code. I wish there was a generic customizable lifting syntax... (see also Status of question mark syntax for missing values - Development - JuliaLang)
I think definition 6 is closer to sets in math
This is definitely not true (see Wikipedia). Definition 1 is the closest, in that a set only has the operation in
. All other set operations (union, intersection) are derived from the in
operation. For example, we can easily derive a lazy union/intersection type that works for any types defined by Definition 1:
struct LazyUnion
a
b
end
in(x, d::LazyUnion) = x in d.a || x in d.b
Therefore LazyUnion satisfies Definition 1 provided a
and b
satisfy Definitino 1.
(The axiom of choice comes into play whether there is also a "choice" function but that's off topic.)
product-based broadcasting if all collections (except numbers) are not indexable ...But this probably is "too dynamic" for people reading the code
Agreed, though I would argue that its well-defined and consistent provided there is only one one set. Actually, I'd be happy if base just throws a "method not found" and then there could be another package SetBroadcasting.jl that implements a SetStyle
for those who want set-like broadcasting.
Hmm... it still sounds problematic to me because, if an object has more properties than set, it can distinguish itself from the other by using such properties (ordering, multiplicity, etc.). For example, if you treat an Array
as a set, using Base.==
would be problematic because it does not follow the axiom of extensionality; i.e., arrays can use shape, ordering, and multiplicity to distinguish one from the other. I suppose you can say that the equality defined in terms of in
should be used when treating arrays as sets. But this is equivalent to saying that arrays can be converted to sets which is not a very specific property and I think reduces the usefulness of the notion of set.
Actually, I'd be happy if base just throws a "method not found" and then there could be another package SetBroadcasting.jl that implements a
SetStyle
for those who want set-like broadcasting.
I agree that this is nice to have. But I prefer to do this without type-piracy, for example, using syntax like
setstyle.(set1 .+ set2)
where setstyle
is a "magic" function that sets the broadcasting style.
Note that this requires us to use an approach different from #28491 which uses broadcastable(::AbstractSet) = throw(...)
. We need to add extra code to allow users to construct Broadcasted
object and throw an error at (say) copyto!
stage. This approach may be useful for #25904 as well (people can experiment with different implementation outside Base
).
Bump. I would like to rekindle the discussion here. I recently had a bug related to this issue. I think we need a better solution here.
So, as a short-term solution, I propose to forbid broadcasting with Sets which can easily be done by defining broadcastable(::AbstractSet), as done with Dicts:
is there objection to this? I don't think the current behavior is reliable so hopefully no code relies on the behavior (need PkgEval in any case)
I'd be fine with things like Set([1,2]) .+ Set([1,2])
and [1,2] .+ Set([1,2])
being errors.
Currently (Julia 1.4-DEV), broadcasting with Set produces a result that relies on iteration order of
Set
which is not meaningful:This is due to the fallback to
collect
inbroadcastable
definition:https://github.com/JuliaLang/julia/blob/9a8b2fd72b675bb8a5bf0322943ee9451787b86a/base/broadcast.jl#L664
A sensible definition may be to use set arithmetic (ref: https://github.com/JuliaMath/IntervalSets.jl/pull/55) such that, for example,
holds. However, given that "join" on indices/keys is proposed as a generalized form of broadcasting #25904 (and special handling for singleton axes), it is not clear how taking "product" for broadcasting with
Set
s can fit in a bigger picture.So, as a short-term solution, I propose to forbid broadcasting with
Set
s which can easily be done by definingbroadcastable(::AbstractSet)
, as done withDict
s:https://github.com/JuliaLang/julia/blob/9a8b2fd72b675bb8a5bf0322943ee9451787b86a/base/broadcast.jl#L665
This, of course, requires core devs to decide that this can be treated as a "minor change."
As a longer-term solution, does it make sense to use set arithmetic (which, I suppose, implementable using the style mechanism)? Is there a guiding principle in which it is natural to have "product" for
Set
s and "join" for associative data structures (includingArray
s)?