eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
398 stars 62 forks source link

Revisiting Ceylon collections #4956

Closed CeylonMigrationBot closed 8 years ago

CeylonMigrationBot commented 12 years ago

[@ikasiuk] Some time ago we completely revised Ceylon's collections model. So perhaps it's time to examine if the result meets our expectations. While trying to do so I have come across a few points that bother me. I am writing them down here so you can hopefully tell me that I'm just too anxious and overcritical :-) Note that I'm not saying the current model isn't good (in fact it's pretty neat), but that doesn't necessarily mean we can't do better.

Complexity and trickery

To properly understand the current collection type hierarchy I actually had to draw a class diagram even though I participated in the discussion that led to it. That's not good: the relation between the basic types like Collection, List, T[] and Array should be easy to understand for newcomers. And that's not the case currently. I would not feel comfortable having to explain this part of Ceylon to someone in detail.

One thing that's not nice is that a String is not a Character[]. While that may be excusable it points to an underlying problem: because FixedSized is of None|Some it cannot be properly implemented by a single class. That makes the implementation of Array and String a bit complicated. Currently, testing whether a String is Some or None using if (is...) reveals that it is neither [Update: fixed]! Consequentially, the line

if (nonempty s = "x") { Some<Character> x = s; }`

causes a ClassCastException at runtime [Update: fixed]. Errors like this can probably be fixed but I really don't feel comfortable with the amount of complexity required to integrate Array and String into the collection type hierarchy.

if (nonempty)

I guess a main reason for introducing FixedSized is if (nonempty). But I'm beginning to think that if (nonempty) is not as useful as I expected: I simply don't need it very often and when I do then in most cases it could better be written as something like if (exists f=l.first). In the remaining cases it's nice but not irreplaceable. Don't get me wrong: in theory I like the idea a lot, it's just practically not as relevant to me as anticipated.

Another problem with if (nonempty) is that it only applies to FixedSizeds. So you can use it on T[], Arrays and Strings, but basically nothing else. In particular, it looks like T... will mean Iterable<T> soon. That will obviously be a common way to pass sequences of values to functions, and if (nonempty) cannot be used on them. The same goes for all mutable Lists as well as Sets and other Collections.

If you write code that uses if (nonempty)and thus needs a FixedSized then you will likely have to convert an input list to a T[] first at some point. On the other hand if you don't want to require a FixedSized you'll write code that checks emptiness by other means than if (nonempty). This means a certain dualism, a bit like between arrays and lists in Java. I always disliked that in Java and had hoped we could get rid of it in Ceylon because I think it adds unnecessary complexity. A list of things should always basically look the same, and be handled in the same way. Consistency makes the language easier to learn and increases the readability of the code.

Comparing if (nonempty) to iteration

A good example of a mechanism where this works great is iteration: we can write for (x in l) for almost anything. So if we need something like if (nonempty) then why doesn't it work as universally as iteration? One might answer that this is because the sequence in question must have a fixed size, so that its size cannot change inside the if block. But the same problem exists with iterators: an iterator typically throws a ConcurrentModificationException if the underlying sequence is modified during iteration. So if we wanted to be safe we would have to restrict iterators to immutable, or at least fixed-sized, sequences. But we don't do that because it would be much too impractical.

Isn't it similarly impractical for if (nonempty)? So perhaps if (nonempty ne=l) should rather be equivalent to something like if (exists ne=l.nonEmpty), with nonEmpty returning some kind of non-empty view of the sequence. Modification of the sequence would then be disallowed inside the if block, exactly analogous to iteration.

[Migrated from ceylon/ceylon.language#78] [Closed at 2013-03-11 19:54:29]

CeylonMigrationBot commented 12 years ago

[@quintesse] Well for one you can't make one Array.instance() and Array.toArray(), you need one for each primitive type + one for objects. The instance() can be overloaded, but the toArray() would need to be split into several different methods.

No, because Array is mutable.

So when assigning an array to a sequence we always have to make a copy?

CeylonMigrationBot commented 12 years ago

[@ikasiuk]

So when assigning an array to a sequence we always have to make a copy?

No, I think Gavin was referring to the interface, not the mutability of the data.

CeylonMigrationBot commented 12 years ago

[@quintesse] If that so, then maybe I didn't explain myself correctly, what if we could erase T[] to arrays as well? Not getting rid of Array which would still be necessary to be mutable. But Integer[] would be erased to int[] and Foo[] simply to Foo[] which would mean that we could directly assign an Array<Integer> to an Integer[] and in Java it would just be a simple assignment as well.

CeylonMigrationBot commented 12 years ago

[@gavinking]

what if we could erase T[] to arrays as well?

I've always hoped we could do that. I don't see any reason why it should not work.

CeylonMigrationBot commented 12 years ago

[@quintesse] BTW what I said about toArray() is not entirely true, right now it's pretty much a hack, it returns Object and when we call it the code looks at the actual Java target type and just adds a cast to that type. All this is not handled in the boxing code itself because that doesn't have the real target types, only the Ceylon types, so it had to be moved to some part of the code that did have that information. Not really pretty.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] The more I think about @gavinking 's proposal in this thread the more I like it. And perhaps it's really better not to try to force Array (which after all exists mainly for interop) into Ceylon's scheme for fixed-size collections. That leaves us more flexibility for both the Array implementation and the design of the collection hierarchy.

But I have one question: shouldn't we disallow non-constant implementations of ConstantList? Ok, it might be hard to really enforce that in a waterproof way. But conceptually it would make sense to consider all ConstantLists as constant not just in size and outlaw non-constant implementations. That could have significant advantages and help promote immutability in Ceylon programs.

CeylonMigrationBot commented 12 years ago

[@FroMage] Assigning all issues without milestone to M4. Yell if this is wrong.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] It is actually not just acceptable if Array is not a FixedSized (and instead directly extends List), it is in fact more correct: we need the Array class to represent native arrays, but JavaScript arrays are not of fixed size! So it would be tricky to properly represents a JavaScript array by a fixed-sized Array because native JS code can easily change its size.

CeylonMigrationBot commented 11 years ago

[@quintesse] @ikasiuk I think this issue should either be closed, woken up or be turned into a new issue discussing specifics. Moving to M5 for now.

CeylonMigrationBot commented 11 years ago

[@tombentley] What's the status of this issue following the changes we've made in M5? Can we close it (opening more specific issues as required)

CeylonMigrationBot commented 11 years ago

[@quintesse] Still no comments? Shall we consider it closed then?

CeylonMigrationBot commented 11 years ago

[@ikasiuk] Yes, I'm not aware of any open points specifically related to this issue.