Open xkr47 opened 6 years ago
Oh and yes, you could then do (4..8) | (6..10)
and get a set containing 4, 5, 6, 7, 8, 9, 10
. Or (monday..friday) & (thursday..sunday)
and get a set with thursday, friday
.
I'm not necessarily disagreeing, but the performance characteristics are much more clear when written as set(3..5).union(set(7..9))
, which matters for large ranges.
Yeah, you'd probably want these operations to be lazy, especially for something like ~1..2
.
@jvasileff wrote:
I'm not necessarily disagreeing, but the performance characteristics are much more clear when written as set(3..5).union(set(7..9)), which matters for large ranges.
Did you mean that they are unclear when written as (4..8) | (6..10)
? I was thinking one would implement a (probably module-local) MultiRange
class that looks like a Set
but internally keeps track of the ranges making up the set instead of having individual numbers stored. It would itself be immutable; further set operators would just create new instances.
@CPColin wrote:
Yeah, you'd probably want these operations to be lazy, especially for something like ~1..2.
You mean (0..10) ~ (1..2)
? Afaik ~
is not defined as an unary operator?
But yeah the MultiRange
proposed above could well be lazy, but one has to be careful not to kill the performance in case e.g. incremental updates to such a Set
are performed. There are several ways to mitigate such problems I guess.. flatten-by-threshold, flatten-manually (something analoguous to .sequence()
of Iterable
, flatten-on-read-access, ...
You mean
(0..10) ~ (1..2)
? Afaik~
is not defined as an unary operator?
Oops, you're right, my bad.
Sure, I can do
set(3..5).union(set(4..6))
,
You can do set(3..5) | set(4..6)
.
but is there any reason that
Range
couldn't satisfySet
?
Yep, the reason is that Range
s are List
s, and therefore can't be Set
s.
So what you're really asking for here is an operator for appending lists, something I've often wondered about adding. For String
s we use +
for this. I suppose we could extend that to List
s-of-same-element-type. (List<T>
is also a semigroup.) I have not looked closely at that option.
So you would be able to write: (3..5) + (4..6)
.
I dunno, I've never been in love with the idea of using +
to mean append, though it is intellectually consistent.
I suppose we could extend that to
List
s-of-same-element-type. (List<T>
is also a semigroup.)
Oh, well, now, I remember, or course the problem here was that Summable<T>
is invariant and so if List<Character>
were a Summable<List<Character>>
then String
could not be a Summable<String>
.
And nor could Range<T>
be a Summable<Range<T>>
, since the "sum" of two ranges isn't a range.
Interestingly, and subversively, nothing's stopping us from making List<T>
a Multiplicable<T>
, letting you write it using the *
operator! (I'm not serious.)
I suppose we could extend that to
List
s-of-same-element-type. (List<T>
is also a semigroup.)Oh, well, now, I remember, or course the problem here was that
Summable<T>
is invariant and so ifList<Character>
were aSummable<List<Character>>
thenString
could not be aSummable<String>
.And nor could
Range<T>
be aSummable<Range<T>>
, since the "sum" of two ranges isn't a range.
Ah, excuse me, that's all rubbish, because the append()
method is defined not by List
, but by Sequential
, and String
s are not sequences in Ceylon, but Range
s are.
So I think it would indeed be possible to make Sequential<T>
satisfy Summable<T[]>
, giving you a +
operator for appending immutable sequences (but not possibly-mutable List
s).
That doesn't actually sound like terrible thing to do.
Oh, I'm wrong again, the real problem is that Sequential
is covariant, and Summable
is invariant. So it can't be done at that level either.
So the only thing to do would be, I guess, to factor out Unionable
and Intersectable
interfaces and make Map
and Set
implement both (union and intersection for Map
are intuitive, since maps are intuitively sets of entries), and make List
implement just Unionable
.
That's a possible—and reasonable—thing to do. The only question is do we think it's a good idea to let people non-destructively join lists using |
?
What is the proper union of this expression:
map{"key" -> "v1"} | map{"key" -> "v2"}
@luolong oops, sorry, you're right, intersection can be defined for Map
, but not union.
@gavinking wrote:
Yep, the reason is that Ranges are Lists, and therefore can't be Sets.
I understand that it is convenient that ranges can be expected to have a predictable order of elements, but aren't they also effectively Set
s? I'm of course not asking to actually make them mutually inclusive, just as a thought exercise..
And nor could Range
be a Summable<Range >, since the "sum" of two ranges isn't a range.
Could Range
instead satisfy Summable<MultiRange<T>>
?
Anyway I would still miss the intersection
, exclusiveunion
etc. I guess I could have a shot at implementing a separate MultiRange
class that would satisfy Set
and allow for easy co-operation with Range
instances.
I'm closing this now but feel free to keep discussing.
I understand that it is convenient that ranges can be expected to have a predictable order of elements, but aren't they also effectively
Set
s?
Well, sure, every Iterable
is effectively a Set
.
One possible reasonable thing to do would be to define |
and &
for the Collection
interface, but have them always produce Set
s. That is, simply pull the current union()
and intersection()
methods up to Collection
from Set
.
That would be a change with pretty minimal impact on the language, that makes the existing |
and &
operators much more useful.
For example, [1->"hello", 3->"world"] | map { 1->"goodbye", 2->"sweet" }
would be equal to set { 1->"hello", 3->"world", 1->"goodbye", 2->"sweet" }
.
@gavinking wrote:
Well, sure, every Iterable is effectively a Set.
Hmm, through using Java I have empirically developed the (perhaps incorrect) understanding that Set
s guarantee uniqueness of entries. So in that context, a [ 1, 1 ]
would not be considered to be a Set
..?
Your example with a sequence and a map is of course fine because all the elements of the set are unique. But based on my hypothesis above I would expect [ 1, 1, 3 ] | set { 4, 1 }
would be equal to set { 1, 1, 3, 4, 1 }
which is equal to set { 1, 3, 4 }
.
So in that context, a
[ 1, 1 ]
would not be considered to be aSet
..?
Well, I mean, mathematically, a set is a value with just one operation, ∊, which in Ceylon we write as in
. So as long as an object has a contains()
method, it can be considered a set.
In particular, [ 1, 1 ]
can be "considered" a Set
by applying the set()
function to it.
(Note that the definition of set in mathematics doesn't say anything at all about iterating values, or any of the other things that distinguish Set
s from List
s.)
But based on my hypothesis above I would expect
[ 1, 1, 3 ] | set { 4, 1 }
would be equal toset { 1, 1, 3, 4, 1 }
which is equal toset { 1, 3, 4 }
.
Right, set { 1, 1, 3, 4, 1 } == set { 1, 3, 4 }
.
One possible reasonable thing to do would be to define | and & for the Collection interface, but have them always produce Sets. That is, simply pull the current union() and intersection() methods up to Collection from Set.
The idea sounds really nice.
So one would put default implementations in Collection and assumably override in Range
with versions checking whether the "other" collection is also a Range
, and fall back to the Collection
implementation if not?
(Reopening since there is now light again :)
override in
Range
with versions checking whether the "other" collection is also aRange
, and fall back to theCollection
implementation if not?
They could be overridden on Sequential
to make use of immutability, yes.
I've implemented this idea (it was really straightforward) and it seems to be working well. I'll push it to a branch later.
I've pushed my work to the 7435 branch. Now we need to see if there is really a consensus in favor of this. Feedback, please!
I'm in favor of this.
@gavinking wrote:
Well, sure, every Iterable is effectively a Set.
Hmm, through using Java I have empirically developed the (perhaps incorrect) understanding that
Set
s guarantee uniqueness of entries. So in that context, a[ 1, 1 ]
would not be considered to be aSet
..?Your example with a sequence and a map is of course fine because all the elements of the set are unique. But based on my hypothesis above I would expect
[ 1, 1, 3 ] | set { 4, 1 }
would be equal toset { 1, 1, 3, 4, 1 }
which is equal toset { 1, 3, 4 }
.
[ 1, 1 ] is indeed not considered to be a mathematical set, because a set is a collection of distinct objects. Therefore an iterable object may or may not be a set. If Set
will still retain its defining feature, that is, having distinct objects, then why not.
[ 1, 1 ] is indeed not considered to be a mathematical set, because a set is a collection of distinct objects.
This isn't really correct, at least not in modern axiomatic set theory, where "set" is a primitive notion. The only operation a mathematical set has is ∊, and one can't distinguish [1]
from [1,1]
using ∊.
This isn't really correct, at least not in modern axiomatic set theory, where "set" is a primitive notion. The only operation a mathematical set has is ∊, and one can't distinguish
[1]
from[1,1]
using ∊.
Both of them are theories anyway, that's not what I really care about. The thing is, I believe having a collection that deals with distinct objects is very useful. Though if a Set
doesn't specify anymore if it contains distinct objects or not, what's the difference between a Set
and a Collection
then?
if a
Set
doesn't specify anymore if it contains distinct objects or not, what's the difference between aSet
and aCollection
then?
There is no suggestion to change the semantics of Set
. The iterator for a Set
always returns distinct values.
Then it's more than ok. That would be very useful.
Sorry I came a bit late to this discussion, but I think I should drop my five cents here.
I don't think | & and ~ are actually part of the semantics for Collection
, but just for Set
.
I'm not saying they are not useful, but just no the right place.
I.e. what's the meaning of the union of two List
s? it is not unreasonable thinking that may retain elements from even if they are repeated (i.o.w. work as a concatenation); but also may just add the "missing" elements to the end of the list; or remove duplicate elements before concatenate...
Instead, I propose creating utility methods in collections, with similar meaning, that can be used to define Set
methods. Say:
Collection.union
we can use Collection.or
, Collection.with
or Collection.addAll
.Collection.intersection
we can use Collection.and
,Collection.retainAll
or Collection.keep
Collection.exclusiveUnion
we can use Collection.xor
or Collection.disjoint
(none of them really convincing me, but can't find something better).Those methods can serve as foundations for Set
operations, but also can be refined to adapt the results to different collection kinds.
@someth2say The idea is that arbitrary collections can be trivially viewed as sets, because conceptually a set is just something that tells you whether or not it contains a given element. Taking the union of two lists means that you want to treat them as sets, you don't care about order, and you want to get back a set containing each distinct element that appears in either list. That's a useful thing to do.
I.e. what's the meaning of the union of two Lists? it is not unreasonable thinking that may retain elements from even if they are repeated (i.o.w. work as a concatenation); but also may just add the "missing" elements to the end of the list; or remove duplicate elements before concatenate...
The Set
return type of these operations means that it is unreasonable to think that. It's another one of those things where you might get the wrong idea the very first time you encounter it, before you understand what it's supposed to be doing, but you only have to learn it that one time.
@gdejohn
The idea is that arbitrary collections can be trivially viewed as sets, because conceptually a set is just something that tells you whether or not it contains a given element
That's one of the root arguments I don't agree with. Quoting @gavinking: mathematically, a set is a value with just one operation, ∊. But practically, sets have many other characteristics we should not ignore (like all contained elements being unequal...and equality is really a hairy topic!).
If you are just keeping the ∊ operation, we usually use the name Collection
or Bag
.
Taking the union of two lists means that you want to treat them as sets... It's another one of those things where you might get the wrong idea the very first time you encounter it...
That's another argument against pulling methods up. You should learn that, when you apply Set
methods to two Collection
s, those will "automagically" be transformed to Set
s, and then the method will be applied; It is not intuitive what the result will be (at least not for me), as a hidden transformation is applied. And Ceylon always tried to avoid hidden transformations.
I'm proposing specific operations for all Collections
(so you can write [3,4,5].with([6,7,8])
) while keeping Set
specific operations as they already are (so you can still use set(3..5) | set(4..6)
). It is just matter of making explicit when you are working with Set
s and when with Collections
.
Also, as @jvasileff said, performance impact is much more clear when set transformation is explicit.
But practically, sets have many other characteristics we should not ignore (like all contained elements being unequal...and equality is really a hairy topic!). If you are just keeping the ∊ operation, we usually use the name
Collection
orBag
.
Conceptually, we only care about ∊ with respect to the collection operands, but we care about more than just ∊ with respect to the result, which is why we return a Set
. The return type communicates all of that.
That's another argument against pulling methods up. You should learn that, when you apply
Set
methods to two collections, those will "automagically" be transformed to sets, and then the method will be applied; It is not intuitive what the result will be (at least not for me), as a hidden transformation is applied. And Ceylon always tried to avoid hidden transformations.
How is that a hidden transformation in some undesirable way that any other function isn't? In general, you don't know how any function is implemented, how it transforms its inputs. Why is that a problem here?
Also, as @jvasileff said, performance impact is much more clear when set transformation is explicit.
I could understand the objection if the time complexity jumped from linear to quadratic or something, but it's still linear, so what's the big deal?
I was longing for
Set
operators/operations inRange
instances. Namelyunion
,intersection
but why not alsocomplement
andexclusiveUnion
.Sure, I can do
set(3..5).union(set(4..6))
, but is there any reason thatRange
couldn't satisfySet
? I read through #3557 and the current interfaces and their implementations and although I didn't perhaps catch everything, I felt I should ask.Since all operations except
intersection
may result in split ranges, the implied return value ofSet
is actually just fine. One could of course have a more specificRange<Element> intersectionWithRange(Range<Element> other)
for convenience.For these set operations one would naturally not care if the ranges are
increasing
ordecreasing
- the result would be the same i.e. the result would be normalized.As for an actual implementation it would perhaps be beneficial to override the default implementations from the
Set
interface for more efficient operation (as long as the "other" set is of same type as "this" set), but they would still be "just Sets"...Thanks for listening..