Revisiting Ceylon collections

CeylonMigrationBot commented 12 years ago

[@ikasiuk] Some time ago we completely revised Ceylon's collections model. So perhaps it's time to examine if the result meets our expectations. While trying to do so I have come across a few points that bother me. I am writing them down here so you can hopefully tell me that I'm just too anxious and overcritical :-) Note that I'm not saying the current model isn't good (in fact it's pretty neat), but that doesn't necessarily mean we can't do better.

Complexity and trickery

To properly understand the current collection type hierarchy I actually had to draw a class diagram even though I participated in the discussion that led to it. That's not good: the relation between the basic types like Collection, List, T[] and Array should be easy to understand for newcomers. And that's not the case currently. I would not feel comfortable having to explain this part of Ceylon to someone in detail.

One thing that's not nice is that a String is not a Character[]. While that may be excusable it points to an underlying problem: because FixedSized is of None|Some it cannot be properly implemented by a single class. That makes the implementation of Array and String a bit complicated. Currently, testing whether a String is Some or None using if (is...) reveals that it is neither [Update: fixed]! Consequentially, the line

if (nonempty s = "x") { Some<Character> x = s; }`

causes a ClassCastException at runtime [Update: fixed]. Errors like this can probably be fixed but I really don't feel comfortable with the amount of complexity required to integrate Array and String into the collection type hierarchy.

if (nonempty)

I guess a main reason for introducing FixedSized is if (nonempty). But I'm beginning to think that if (nonempty) is not as useful as I expected: I simply don't need it very often and when I do then in most cases it could better be written as something like if (exists f=l.first). In the remaining cases it's nice but not irreplaceable. Don't get me wrong: in theory I like the idea a lot, it's just practically not as relevant to me as anticipated.

Another problem with if (nonempty) is that it only applies to FixedSizeds. So you can use it on T[], Arrays and Strings, but basically nothing else. In particular, it looks like T... will mean Iterable<T> soon. That will obviously be a common way to pass sequences of values to functions, and if (nonempty) cannot be used on them. The same goes for all mutable Lists as well as Sets and other Collections.

If you write code that uses if (nonempty)and thus needs a FixedSized then you will likely have to convert an input list to a T[] first at some point. On the other hand if you don't want to require a FixedSized you'll write code that checks emptiness by other means than if (nonempty). This means a certain dualism, a bit like between arrays and lists in Java. I always disliked that in Java and had hoped we could get rid of it in Ceylon because I think it adds unnecessary complexity. A list of things should always basically look the same, and be handled in the same way. Consistency makes the language easier to learn and increases the readability of the code.

Comparing if (nonempty) to iteration

A good example of a mechanism where this works great is iteration: we can write for (x in l) for almost anything. So if we need something like if (nonempty) then why doesn't it work as universally as iteration? One might answer that this is because the sequence in question must have a fixed size, so that its size cannot change inside the if block. But the same problem exists with iterators: an iterator typically throws a ConcurrentModificationException if the underlying sequence is modified during iteration. So if we wanted to be safe we would have to restrict iterators to immutable, or at least fixed-sized, sequences. But we don't do that because it would be much too impractical.

Isn't it similarly impractical for if (nonempty)? So perhaps if (nonempty ne=l) should rather be equivalent to something like if (exists ne=l.nonEmpty), with nonEmpty returning some kind of non-empty view of the sequence. Modification of the sequence would then be disallowed inside the if block, exactly analogous to iteration.

[Migrated from ceylon/ceylon.language#78] [Closed at 2013-03-11 19:54:29]

CeylonMigrationBot commented 12 years ago

[@quintesse] I would also add that the current design of Array makes it really hard to optimize it. Trying to erase them to Java arrays is overly complex and that's without thinking about how to handle the Java basic types. And all of this was the original goal of the new Array design.

The fact that Array is not a single implementing class, mentioned by @ikasiuk, results in problems with boxing/unboxing for example that would suddenly depend on the actual value of the type, not something that fits well in the current boxing design.

It also means that trying to make specialized subclasses that handle Java basic types suddenly results in many more classes, which in itself is not too nice but it is also making the previous boxing problem even more serious.

I have the feeling we're complicating the language design without a proper "ROI", at least from the perspective of the Array class.

CeylonMigrationBot commented 12 years ago

[@gavinking]

Errors like this can probably be fixed but that doesn't hide the fact that the trickery required to integrate Array and String into the collection type hierarchy is a shameful mess.

This is overstating the case quite a lot. On the contrary, it's merely a bug in the Java implementation of the language module. I'll fix it later today.

And it would be simply impossible to implement these classes in plain Ceylon.

That's definitely not true. Is it?

CeylonMigrationBot commented 12 years ago

[@gavinking] @quintesse But these problems are in principle solvable, are they not? i.e. it is a simple matter of writing code to solve them. That's our job, no?

CeylonMigrationBot commented 12 years ago

[@gavinking]

One thing that's not nice is that a String is not a Character[].

So I need to check, but this might be the killer argument in favor of #4919. It might just be that if we introduce FizedSizedList, then String can be a subtype. Well, I need to check that that would work with the signatures of all the operations like the ones on Ranged, etc.

CeylonMigrationBot commented 12 years ago

[@quintesse] @gavinking well as @FroMage can probably confirm the boxing code is already hellishly complex, I wouldn't like adding a layer of complexity on top of that to be honest. To me the whole idea of Array has always been just to make interop with Java arrays possible, hopefully in the most efficient way. I don't think we should worry about the result not fitting in the Some/None design, they'd still be iterable so I'd be easy to turn them into a sequence.

Maybe a possibility would be trying to come to it from the other side: first drop the FixedSized on Array, get all the erasure and boxing to work (including for arrays of the Java basic types), which will already be pretty complex and then see if we can refit the FixedSized to it, or look for other acceptable alternatives.

CeylonMigrationBot commented 12 years ago

[@quintesse] We couldn't just make Ceylon T[] be Java T[], right? ;)

CeylonMigrationBot commented 12 years ago

[@gavinking] So after 5 mins of programming I managed to make this compile and run correctly:

void testString() {
    String s1 = "hello";
    if (nonempty s1) {
        print(s1.first);
        print(s1.rest);
        Some<Character> x = s1;
        print(x);
    }
    String s2 = "";
    if (nonempty s2) {
        print("oops!");
    }
    else {
        print("empty string");
    }
}

Note that String is a boxed/unboxed type. Now, I'm not sure what extra difficulties affect Array that don't affect String, but they seem like very similar problems, no?

CeylonMigrationBot commented 12 years ago

[@quintesse] Well... I'm not sure, maybe the Java implementation of Array is just wrong, but for example String does not have 2 implementing classes: one for an empty string and one for all the others (in fact it's abstract). If we could use the same trick for Array we might solve a lot of the problems I encountered.

CeylonMigrationBot commented 12 years ago

[@quintesse] Ah, forget I said that, I see that's exactly what you changed in your fix ;)

CeylonMigrationBot commented 12 years ago

[@gavinking] @quintesse I really don't know what you're worried about. After fixing a couple of small bugs, the following code works:

void testArray() {
    variable value count:=1;
    class Foo() { shared actual String string = "Foo " count++ ""; }
    value arr1 = array(Foo(), Foo(), Foo());
    if (nonempty arr1) {
        print(arr1.first);
        print(arr1.rest);
        for (x in arr1.rest) { print(x); }
    }
    value arr2 = array(1, 2, 3);
    if (nonempty arr2) {
        print(arr2.first);
        print(arr2.rest);
        for (x in arr2.rest) { print(x); }
    }
}

And, IMO, is pretty damn cool....

CeylonMigrationBot commented 12 years ago

[@quintesse] It's not that it doesn't work (I made the damn Array implementation after all ;) ) it's just that it's highly inefficient, it does not do any erasure at all.

In a "perfect world" the value arr1 line would translate to

Foo[] arr1 = { new Foo(), new Foo(), new Foo() };

and the value arr2 to:

long[] arr2 = { 1, 2, 3 };

but it gets translated to a new NonEmptyArray() and especially in the second case a whole lot of boxing and unboxing on each reference.

CeylonMigrationBot commented 12 years ago

[@gavinking]

So perhaps if (nonempty ne=l) should rather be equivalent to something like if (exists ne=l.nonEmpty), with nonEmpty returning some kind of non-empty view of the sequence.

So I think this is probably a not-unreasonable approach, and it is one to which we have given some consideration before.

Let's see what it might look like. We could:

get rid of FixedSized, Some, and None, and
replace them with the following interface:

shared interface Sequential<out Some, out None> {
    shared Some|None someOrNone;
}

Then since someOrNone is not required to return the receiving object, List can be a subtype of this interface:

shared interface List<out Element>
        satisfies Collection<Element> &
                  Correspondence<Integer, Element> &
                  Ranged<Integer,List<Element>> &
                  Sequenced<Sequence<Element>, Empty> &
                  Cloneable<List<Element>> { ... }

Then the if (nonempty ...) construct accepts a Sequenced and "narrows" it to its Some type. (In this case, it "narrows" a List<Element> to a Sequence<Element>.)

if (nonempty list) {
    print(list.first);
}

The issue with this, and the reason we did not go down this path before, is that you wind up with list referring to a completely different object inside the block. What looks like a simple type narrowing operation is in fact something much different. Indeed, if (nonempty) could now do some pretty arbitrary things, having nothing to do with "emptiness"!

Well, it may still be a better option on balance - I would love to say that T[] means List<T> - but it's not without problems of its own.

CeylonMigrationBot commented 12 years ago

[@gavinking] i.e. if (nonempty list) would mean if (is Sequence<Element> list = list.someOrNone)

CeylonMigrationBot commented 12 years ago

[@gavinking] On the other hand, arguing against myself as usual, Sequential doesn't really give us a whole lot extra that we can't already do now. We can already write:

if (nonempty seq = list.sequence) { ... }

So it seems to me that List<T> not being a T[] is not really a huge issue here. It seems to me that the bigger inconvenience is that String and maybe Array<T> aren't instances of T[]. But at least in the case of String I believe that to be fixable by introducing FixedSizedList<T>.

CeylonMigrationBot commented 12 years ago

[@ikasiuk]

That's definitely not true. Is it?

Ok, I was thinking of implementing it in a single class. But the right way to do it is to have two separate classes internally, one for the empty case and one for the non-empty case, and hide those behind a shared abstract class (and it seems that's exactly what we do now). That should indeed also work in pure Ceylon. The only downside is that you can't create instances of the class directly but have to provide a separate toplevel function for that purpose (independent of the language used to implement it).

I'll update the text accordingly. BTW: is there a markdown syntax for strikethrough text? It would sometimes be nice to keep the description text up-to-date but nevertheless indicate what was changed/removed.

This is overstating the case quite a lot. On the contrary, it's merely a bug in the Java implementation of the language module. I'll fix it later today.

I had no doubt that the bug would be fixed swiftly although you were once again even faster than expected :-) But I was not so much referring to the bug as to the complexity. Maybe it's not that unclean after all, just complicated. I still wouldn't feel comfortable having to explain the details to a newcomer though. I'll update the text to better reflect that this is just a personal opinion.

CeylonMigrationBot commented 12 years ago

[@ikasiuk]

So I need to check, but this might be the killer argument in favor of #4919. It might just be that if we introduce FizedSizedList, then String can be a subtype. Well, I need to check that that would work with the signatures of all the operations like the ones on Ranged, etc.

Sorry, can you explain that? How do Set/Slots affect this? Not sure if I like the idea of introducing another interface for representing fixed size...

CeylonMigrationBot commented 12 years ago

[@ikasiuk]

Well, it may still be a better option on balance - I would love to say that T[] means List - but it's not without problems of its own.

I agree with every part of this sentence.

The issue with this, and the reason we did not go down this path before, is that you wind up with list referring to a completely different object inside the block. What looks like a simple type narrowing operation is in fact something much different.

Yes, that's true. We could at least enforce the if (nonempty x=y) syntax, but admittedly that still suggests that x is the same object as y.

We can already write:
if (nonempty seq = list.sequence) { ... }

You mean if we defined a sequence attribute. Or does that already exist somewhere? I still think if (nonempty) should work on more or less any sequence of values, ideally even on Iterables. Following this simpler approach we could say that the definition of if (nonempty) remains unchanged and for collections that don't have a fixed size we can write something like if (nonempty ne=lst.someOrNone).

We could then even go one step further and get rid of FixedSized altogether, possibly allowing us to make T[] mean List<T> (and I'd really love to see that one day). Of course we would then have to write if (nonempty y=x.someOrNone) in many cases where we can currently write if (nonempty x). But i think that might be worth it, and perhaps we could still make if (nonempty x) work for things like arrays and strings.

Something else just occurred to me: Iterables are becoming increasingly important. We'll probably make T... mean Iterable<T> in parameter lists, which is a great idea IMO. And don't sequence comprehensions also basically boil down to auto-generated Iterables? An Iterable can either represent an actual collection or a sequence of values which are generated on demand. In the latter case one would typically want to avoid an unnecessary "greedy" step in the processing of that sequence, i.e. a step which flattens the Iterable into an actual collection. So you shouldn't unnecessarily convert an Iterable to a T[] by writing {it...}. Is that right?

That's another reason to make language support for Iterables as comfortable as possible and to support if (nonempty) even for Iterables. I think that would be feasible although it would probably mean that we have to keep the Some interface because Iterables have no last.

CeylonMigrationBot commented 12 years ago

[@gavinking]

Sorry, can you explain that?

oops, I mean #3147.

CeylonMigrationBot commented 12 years ago

[@gavinking]

You mean if we defined a sequence attribute. Or does that already exist somewhere?

It's defined on Iterable in the comprehensions branch.

Something else just occurred to me: Iterables are becoming increasingly important. We'll probably make T... mean Iterable<T> in parameter lists, which is a great idea IMO. And don't sequence comprehensions also basically boil down to auto-generated Iterables?

Yes, all this is already implemented (in typechecker and language module) in the branches.

CeylonMigrationBot commented 12 years ago

[@gavinking]

But at least in the case of String I believe that to be fixable by introducing FixedSizedList<T>.

I played with this last night on my local machine and it indeed works out. If T[] is defined to mean FixedSizedList<T> (a subtype of List and FixedSized) then String can be a T[].

CeylonMigrationBot commented 12 years ago

[@gavinking] OK, so here is a new proposal, motivated by the following observations:

Simplicitly does matter.
I liked the idea of abstracting FixedSized to any kind of collection, but the truth is that this has not quite worked out as well as I had hoped because, with it being so abstract, its operations and the return values of its operations are a bit too abstract for lists. Furthermore, while a fixed sized Set or Map is a slightly useful construct, it's far less useful than a fixed sized List.
The point is taken that String should be a Character[], and if it's a problem for String, then it will be a problem for other fixed length list implementations.
Except for Array, all the fixed length lists we know about are actually immutable lists. And, finally, it's not really that important to me whether Array is fixed length or not. If Tako hates the implementation implications of making them fixed length, well, I can live with them not having this feature.

Therefore, I propose:

Rename FixedSized to ConstantList and make it a subtype of List. It will still have the enumerated subtypes Some|None.
Make Sequence a concrete class implementing Some.
Make String implement ConstantList.
Make Range directly implement Some (not Sequence).
Replace the Empty interface with an object empty implementing None.
Make Array directly extend List (not ConstantList).

And then:

Define T[] to mean ConstantList<T>

I don't think anyone will have much problems understanding this hierarchy. Nor do I think it's much of a problem to have to call sequence to convert a generic Iterable to a ConstantList.

This is a major change that will break 100's of tests, but it seems to be more intuitively understandable and somewhat objectively simpler.

CeylonMigrationBot commented 12 years ago

[@gavinking] P.S. I still don't understand why it's possible to erase the (now fixed length) type String but not the very similar type Array. They seem the same to me. But I suppose I'll just have to take Tako's word on that.

CeylonMigrationBot commented 12 years ago

[@RossTate] So if the problem is that nonempty x = y makes it look like x is the same reference as y, then why not just have a syntax that doesn't give that impression?

How about if ({a, b, c, t...} matches list)?

Then have Iterable<T> have a method

Empty|T+ match()

that gives the current state of the iterable.

Of course, you could design this to work for powerful and user-extensible pattern matching. (My example already uses nested pattern matching.)

CeylonMigrationBot commented 12 years ago

[@gavinking] Guys, please stop fixating on nonempty. That is not the only application of this feature. Look at the signature of min() and max(), and tell me how you would write that down without encoding nonemptiness into the type system.

CeylonMigrationBot commented 12 years ago

[@RossTate]

shared Value min<Value>(Value first, Value... rest) 
        given Value satisfies Comparable<Value> {
    variable min := first;
    for (val in rest) {
        if (val < min) {
            min := val;
        }
    }
    return min;
}

If list has type Integer+, then do min(list.first, list.rest...). If list has type Integer[], then do min(max_int, list...) or

Integer? result = {if ({first, rest...} matches list) min(first, rest...)};

if you don't want to assume there is a maximum (say if we were dealing with arbitrary-precision integers).

CeylonMigrationBot commented 12 years ago

[@gavinking] @RossTate I'm heading right to the bathroom to go vomit. I think you've just perfectly demonstrated he value of this stuff :-)

CeylonMigrationBot commented 12 years ago

[@RossTate] I forgot to say that, if list has type Integer+, then do min(list...). That requires some more intelligence on the type checker's part (realizing that an Integer+ can be passed to a method expecting an Integer and an Integer...).

CeylonMigrationBot commented 12 years ago

[@gavinking] P.S. this whole discussion is about whether we're going to eliminate our "Integer+" type of not. If you have an Integer+, then that is the right type to use in the signature of min()/max(). (And we do, it's called Some<Integer>.)

CeylonMigrationBot commented 12 years ago

[@RossTate] Well there's the obvious

shared Value? min<Value>(Value... values) 
        given Value satisfies Comparable<Value> {
    if ({variable min, rest} matches values) {
        for (val in rest) {
            if (val < min) {
                min := val;
            }
        }
        return min;
    } else {
        return null;
    }
}

I was just trying to give a solution that would not need null and didn't need the presence of Integer+ (but would be compatible with such a feature).

CeylonMigrationBot commented 12 years ago

[@RossTate] And next time, please explain what you don't like rather than just insult it.

CeylonMigrationBot commented 12 years ago

[@gavinking]

I was just trying to give a solution that would not need null

Right, the whole point of all this is that unlike other languages we have a typed null in Ceylon, and so null is something you want to avoid dealing with wherever possible. (It's not like Java where you can just ignore it and get an NPE later on in production or in your tests if you're lucky.)

Empty collections have the tendency to result in functions that produce nulls (or, much worse, exceptions). So if you can eliminate the possibility of emptiness early, then you don't have to deal with nulls later.

And next time, please explain what you don't like rather than just insult it.

OK.

CeylonMigrationBot commented 12 years ago

[@RossTate] Then make it so that nulls are easier to deal with, rather than designing a complex (conflicted) type system to deal with just one situation out of many that nulls can arise from.

Here are two common situations I have yet to see handled:

The expression has type T? but the programmer knows it's not null (as is the case for the above min when the list is nonempty)
The expression has type T?, the return type of the method has type X?, and the programmer wants to return null if the expression is null

Now, back to something more concrete, I don't think your ConstantList design will work. If I understand it correctly, you need a ConstantList to use nonempty, and if you have an Iterable then you need to eagerly evaluate the whole thing to get a ConstantList. Yet I would expect to use something like min on large lazily built lists, in which case this design is very inefficient.

Maybe it would be better to just have Iterable<T> have the method

T aggregate(T id, T op(T,T))

which takes a monoid and evaluates it on the list. So if we had Integer min(Integer,Integer), we could do {0, 3, 2, 5}.aggregate(max_int, min). This design also enables lots of parallelism.

You could then have

T? aggregateNull(T op(T,T))

for the cases where there is no identity for the operator. Nonempty subinterfaces could then narrow the return type to be just T.

CeylonMigrationBot commented 12 years ago

[@gavinking]

Then make it so that nulls are easier to deal with, rather than designing a complex (conflicted) type system to deal with just one situation out of many that nulls can arise from.

Huh? That doesn't make sense. How does this part of the collections design impact the type system? The type system doesn't even know about Nothing or ConstantList. They are just ordinary classes/interfaces from the point of view of the type system.

The complexity issue now boils down to whether people are smart enough to understand the relationship between List, its subtype ConstantList, and its enumerated subtypes Some and None. I personally think it's pretty absurd to claim that this is complex or difficult to understand. It's definitely not "conflicted".

The initial issue report had some useful observations in it, and the truth is that the first phase of the redesign was never really completed because I got yanked away from development suddenly for two months and never got a chance to finish it off properly and round off the hard edges. But now I'm getting the feeling that you guys are blowing waaay out of proportion something that is actually pretty straightforward, especially after applying a couple of simplifications proposed above.

FTR, I don't buy that there is any real conceptual problem here, the problem is how to present this to people who are new to the language without them finding it confusing. In the end we're arguing over three interfaces (ConstantList, Some, and None) in the collections hierarchy.

Yet I would expect to use something like min on large lazily built lists, in which case this design is very inefficient.

Look, if you keep introducing new requirements (in this case, large efficient typesafe nonempty Iterables) then naturally things are going to get more and more complex. I've tried to pare back some of the requirements in this latest proposal, in order to get a design that newbies will readily understand. No, it doesn't solve all problems. Yes, that's perfectly OK, because it simplifies the code in some really common cases without making it impossible to deal with the more complex cases.

Maybe it would be better to just have Iterable<T> have the method aggregate() which takes a monoid and evaluates it on the list.

So ConstantList is complex and difficult to understand but monoids and all the attendent type class machinery they drag along with them are not? Really?

And FTR monoids may solve the problem of sum(noNumbers) returning 0 but they don't solve the problem of min(noNumbers) returning null, so now you're addressing a different problem, not the problem which is under discussion.

CeylonMigrationBot commented 12 years ago

[@RossTate]

How does this part of the collections design impact the type system?

Right now, nonempty is a part of the type system. If we want list... to be used where (Integer, Integer...) is expected when list is nonempty, then that's part of the type system. If we want comprehensions (which we have yet to discuss), then that's part of the type system.

It's definitely not "conflicted".

It sounds like I can't use nonempty on arrays under the new design. I do not understand why.

Look, if you keep introducing new requirements (in this case, large efficient typesafe nonempty Iterables) then naturally things are going to get more and more complex.

This is not a new requirement (nor did I say "nonempty"). C# and Scala already have stuff for this. It's one of the reasons people like them over Java. I don't think a solution for list stuff is good if it works for only small or only large lists, or eager vs. lazy lists, or mutable vs. immutable lists.

So ConstantList is complex and difficult to understand but monoids and all the attendent type class machinery they drag along with them are not?

I did not use type classes anywhere. I used an element and a binary operator. I think someone new to Ceylon would have no problem with that.

And FTR monoids may solve the problem of sum(noNumbers) returning 0 but they don't solve the problem of min(noNumbers) returning null, so now you're addressing a different problem, not the problem which is under discussion.

This is why I had aggregateNull and allowed nonempty subinterfaces to narrow the return type to be non-null. Thus you could do list.aggregateNull(min) regardless of whether list were Integer[] or Integer+, but in the latter case the return type would be guaranteed to be non-null.

CeylonMigrationBot commented 12 years ago

[@gavinking]

Right now, nonempty is a part of the type system.

nonempty just means is Some<T> where T is the principal instantiation of ConstantList in the supertypes of the argument. i.e. nonempty is completely defined by reference to other constructs.

It sounds like I can't use nonempty on arrays under the new design. I do not understand why.

Neither do I. I don't understand why boxing/unboxing an array is any different to boxing/unboxing a string. You'll have to ask Tako.

Look, if you keep introducing new requirements (in this case, large efficient typesafe nonempty Iterables) then naturally things are going to get more and more complex.

This is not a new requirement (nor did I say "nonempty"). C# and Scala already have stuff for this. It's one of the reasons people like them over Java. I don't think a solution for list stuff is good if it works for only small or only large lists, or eager vs. lazy lists, or mutable vs. immutable lists.

If you remove "typesafe nonempty" from the list of things you require from large efficient Iterables, then we already have that today in the comprehensions branch, and it's totally orthogonal to the current discussion, and it's really muddying the waters to raise the issue here. Please discuss this in #3390. I'm not trying to be rude, I'm just annoyed that you guys seem to have got the impression that getting rid of typesafe nonemptiness will be some magical panacea that will solve all kinds of totally unrelated problems. No. The only thing it would do is reduce the number of types in the collections hierarchy, and make you have to write if (exists) more often.

I did not use type classes anywhere. I used an element and a binary operator. I think someone new to Ceylon would have no problem with that.

Nevertheless, it still does not solve the problems we're discussing.

This is why I had aggregateNull and allowed nonempty subinterfaces to narrow the return type to be non-null.

Again, the argument is over whether or not we should have nonempty subinterfaces. If there are no nonempty subinterfaces (as proposed) you have nowhere to narrow to a non-null type. If we do have nonempty subinterfaces, there is no problem, and we can easily write down the signature of functions like aggregate(), min(), and max().

CeylonMigrationBot commented 12 years ago

[@RossTate] Okay, I think I get it. Here are my arguments then:

Since there are a lot of features defined in terms of Iterable, having nonempty not work for Iterable creates a conflicted design.
Statically type checking nonemptiness seems impractical. Many algorithms obviously produce nonempty lists with even just a basic understanding of the semantics and relevant behaviors, none of which the type checker understands.
Even where we do get nonemptiness to work, it's by significantly restricting the implementations of collections.
So far, I've only seen a few examples where nonemptiness makes an impact, and many of these examples can be circumvented by an initial value (e.g. max_int) or by null.

Non-nullness, on the other hand, doesn't conflict with other language features. It is typically easy to track without understanding semantics. There is only one way to implement a T? object. There are many many examples where non-nullness makes a big difference.

Because of this, it makes sense to me to track non-nullness and not non-emptiness.

CeylonMigrationBot commented 12 years ago

[@ikasiuk]

I still don't understand why it's possible to erase the (now fixed length) type String but not the very similar type Array. They seem the same to me.

Well, there is one important difference: Array has a type parameter, String doesn't. I guess that could make a difference especially for empty arrays. But please don't ask me to explain the implications in detail.

Gavin, you more or less convinced me that non-emptiness should be encoded in the type system. Also, your intuition is usually right, so I'll just trust you there and stop arguing against non-emptiness and if (nonempty). And the new proposal looks good: the concept of fixed size is built into the type hierarchy in a simpler and more transparent way.

Except for Array, all the fixed length lists we know about are actually immutable lists.

But there is nothing that prevents mutable implementations of ConstantList, isn't there? Or do you want to revive the idea of immutable sequences? Otherwise I don't understand why Array can't satisfy ConstantList just as it currently satisfies FixedSized.

Nor do I think it's much of a problem to have to call sequence to convert a generic Iterable to a ConstantList.

And that's where we disagree. It's very convenient to write if (nonempty s=items.sequence). But that would always flatten an Iterable into a list and thus potentially result in a huge penalty and undermine the efficient processing of Iterables. And it's easy to overlook that because the call looks quite harmless. I still think there should be a safe was of using if (nonempty) on an Iterable.

So how could that be accomplished with the new proposal? I guess the only possibility is to have Some and None interfaces not satisfying List, as in the current model. Maybe that's possible without complicating things too much. How about this:

interface Iterable<out T> {
    shared formal Some<T>|None<T> fixedSized;
    ...
}
interface None<out T> satisfies Iterable<T> {...}
interface Some<out T> satisfies Iterable<T> {...}

interface Collection<out T> satisfies Iterable<T> {
    shared actual formal NonemptyList<T>|EmptyList<T> fixedSized;
    ...
}
interface List<out T> satisfies Collection<T> {...}

interface ConstantList<out T> of NonemptyList<T>|EmptyList<T>
      satisfies List<T>  {...}
interface NonemptyList<out T> satisfies ConstantList<T>&None<T> {...}
interface EmptyList<out T> satisfies ConstantList<T>&Some<T> {...}

if (nonempty) could then narrow to NonemptyList for ConstantLists and to Some otherwise. The result: you could simply use if (nonempty seq) on all ConstantLists and if (nonempty ne=l.fixedSized) on all other Iterables, without having to bother about a possible penalty. In other words: if (nonempty) would be as universal as iteration and you could write very similar and efficient code for all Iterables.

CeylonMigrationBot commented 12 years ago

[@gavinking] @RossTate

Since there are a lot of features defined in terms of Iterable, having nonempty not work for Iterable creates a conflicted design.

If the choice is between not having it for every Iterables, and not having it at all, I don't really see how it's disadvantageous to have it for some subtypes of Iterable.

Statically type checking nonemptiness seems impractical.

Perhaps. As I've said whenever this question has come up: I simply don't know, because I have not used Ceylon enough in practice yet. If you're right, we'll find out in good time. But until we actually have some practical experience, I refuse to form any opinion on what is practical and what isn't.

Judgements about practicality depend on empirical evidence which simply doesn't exist today.

So far, I've only seen a few examples where nonemptiness makes an impact.

I personally find if (nonempty) pretty convenient in my toy example code. I could live without it, but it's definitely not a feature that gets in my way or causes me any problems. Plus, I think it's more readable and easier to type than the alternate idiom if (exists first = seq.first).

Likewise, I think it's very nice that I can define a max() function that returns a non-null type. I could also live without that, but I don't see how it breaks anything that I have that option available to me.

It may turn out that this stuff is less useful than it seems to me right now. In which case it will be easy enough to remove. It's not like it's warping any other bits of the language / library design to have it in there.

CeylonMigrationBot commented 12 years ago

[@gavinking] @ikasiuk

Gavin, you more or less convinced me that non-emptiness should be encoded in the type system. Also, your intuition is usually right, so I'll just trust you there and stop arguing against non-emptiness and if (nonempty).

It's not that I'm asking you guys to trust my intuition. It's that I'm asking you to wait until we have enough experience to see if it works out nicely in practice, or if it's something that winds up not adding much value.

I guess the only possibility is to have Some and None interfaces not satisfying List, as in the current model. Maybe that's possible without complicating things too much. How about this: [snip]

So the problem with this is that we don't expand enumerated types to their cases when determining assignability or searching for supertypes. At one stage I attempted to implement this but ran into problems with decidability. At first I thought it was trivial because it looks like essentially the same thing we do with unions, but in fact it turns out to be much trickier and Ross thinks it's possibly undecidable.

So, taking your code above, the following doesn't pass the typechecker:

void testNonempty(ConstantList<String> list) {
    if (is Some<String> list) {
        NonemptyList<String> ss = list;
    }
}

Now, the truth is that it would be pretty trivial to special case ConstantList in the definition of nonempty, and perhaps that's the solution, but that's the kind of thing I don't like doing, which is why I was experimenting above with the idea of encoding the "nonempty type" into the type arguments of a special supertype.

CeylonMigrationBot commented 12 years ago

[@RossTate] Alright, that sounds sensible.

One clarification: I wasn't proposing getting rid of something like nonempty altogether; I was proposing having a nonempty-like operation that works for all iterables, such as the pattern-matching approach. Nonetheless, as you say, let's explore how this works out.

CeylonMigrationBot commented 12 years ago

[@ikasiuk]

Now, the truth is that it would be pretty trivial to special case ConstantList in the definition of nonempty, and perhaps that's the solution, but that's the kind of thing I don't like doing, which is why I was experimenting above with the idea of encoding the "nonempty type" into the type arguments of a special supertype.

Yes, looks like a special case would be necessary. But considering this from the user perspective, most people probably wouldn't even notice that something special is going on there. And those who notice would think: "Nice, the compiler is automatically doing the right thing!". So maybe it's worth a little inconvenience in the implementation. But I'll admit that we can probably do without it if you think it's not worth the effort.

CeylonMigrationBot commented 12 years ago

[@gavinking]

So maybe it's worth a little inconvenience in the implementation.

Oh, it's not the implementation that bugs me. Implementation-wise it's completely trivial. It's the principle of the thing. We can solve lots of irritating little problems by writing lots of special little rules about how this or that operator behaves with this or that particular type. But that's pretty contrary to our whole philosophy...

But considering this from the user perspective, most people probably wouldn't even notice that something special is going on there. And those who notice would think: "Nice, the compiler is automatically doing the right thing!".

Oh, in this case that's certainly true, I don't dispute for a second that a special case here would work.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] @gavinking Ok, getting back to your proposal: are mutable implementations of ConstantList allowed? And if yes, why can't Array be a ConstantList?

CeylonMigrationBot commented 12 years ago

[@quintesse] @ikasiuk It's not that Array can't be a ConstantList, it's a FixedSized now so of course it can be. But @gavinking made it so at my request basically. The current design with FixedSized makes it very difficult in the Java backend to try to do any erasure to Java arrays (the current implementation does not try to do so, all Java arrays are always wrapped into an Array). So to try to get Arrays up to the level of other erased types (like the primitive types and String) I asked him if we could get rid of the Some|None duality. We'll try to get erasure to work and then later on we can revisit the design and see if it's now easier to bring Array back into the fold, so to speak.

CeylonMigrationBot commented 12 years ago

[@quintesse] Oh, BTW if somebody thinks he can implement efficient erasure and (un)boxing for Array with the new design intact for Array and is willing to make the effort, be my guest. I tried for weeks but in the end it was just adding too much complexity in too many different places to keep it all straight and I just gave up, but that sure doesn't mean it can't be done.

CeylonMigrationBot commented 12 years ago

[@gavinking]

So to try to get Arrays up to the level of other erased types (like the primitive types and String) I asked him if we could get rid of the Some|None duality. We'll try to get erasure to work and then later on we can revisit the design and see if it's now easier to bring Array back into the fold, so to speak.

But right now String has the Some|None "duality" and is being erased! Why is Array any different?

CeylonMigrationBot commented 12 years ago

[@ikasiuk] Thanks for the explanation. So I'll assume that mutable implementations of ConstantList are in fact allowed.

I guess overall I'm ok with the new proposal.

But right now String has the Some|None "duality" and is being erased! Why is Array any different?

My guess is that this is because of the type parameter, especially because it's an invariant one.

CeylonMigrationBot commented 12 years ago

[@quintesse] Well for one thing because String is just a simple, non-generic, object, you can always treat it the same. With arrays that's not the case, I cannot use the same code that treats int[] to treat T[]. So boxing and unboxing becomes much more complex and with the added duality it becomes even more so and because I'm forced to implement it I have no way to go step by step, upping the complexity bit by bit, testing each step along the way.

CeylonMigrationBot commented 12 years ago

[@quintesse] Look, I'd really love to be able to do away with Array and just let T[] be a Java array. But is that even possible?

CeylonMigrationBot commented 12 years ago

[@gavinking]

I cannot use the same code that treats int[] to treat T[].

It seems like this is the actual problem, and that Some|None is a red herring here.

and with the added duality it becomes even more so

I really don't see how that could possibly be true. The duality is handled inside the implementation of String.instance() / Array.instance() and String.toString() / Array.toArray(). I don't see how the boxing/unboxing code ever needs to know about it at all.

Look, I'd really love to be able to do away with Array and just let T[] be a Java array. But is that even possible?

No, because Array is mutable.

eclipse-archived / ceylon