Proposal: Default to Arrays over Lists

rtfeldman commented 9 years ago

A quick note to guide discussion: a clear prerequisite for a change of this magnitude would be benchmarking before and after to make sure it actually makes sense from a performance perspective. There's no way to tell what the results of those benchmarks will be without running them, so let's discuss assuming that this won't happen if the benchmarks say it shouldn't—and instead discuss what the other pros and cons would be assuming the benchmarks check out.

Proposed Changes

1. Square brackets are now literals for Array values, not List values. In other words:

stuff : Array String
stuff = [ "foo", "bar", "baz" ]

This would be a breaking change.

2. The default data structure for libraries (in Core, etc.) becomes Array instead of List.

For example, this would mean the following changes for Dict:

Dict.keys returns an Array
Dict.values returns an Array
Dict.toList becomes Dict.toArray
Dict.fromList becomes Dict.fromArray

This would also be a breaking change.

Motivation

A comment in another thread by @TheSeaMau5 got me thinking about this and discussing some things with @evancz.

There are several reasons to do this:

1. Performance

We have already seen, in WebGL cases, specific examples of Elm programs that run noticeably faster when using Array instead of List on otherwise equivalent code. We have yet to seen the reverse, although granted that is harder to reproduce because literals produce List values and libraries tend to prefer List.

Because Elm's Arrays are implemented using Relaxed Radix Balanced Trees, the performance characteristics associated with (for example) recursively pattern matching should be substantially better here than they are with large vanilla JS arrays. For pattern matching on small lists, the extra overhead of instantiating N extra arrays may be comparable (and perhaps even less) to the extra overhead of instantiating N Cons cell objects, which is required to instantiate even a List that does not use pattern matching.

Benchmarks should be able to shed better light on this.

2. Familiarity to JS Programmers

As @TheSeaMau5 said:

JS users are a big part of Elm's target. JS people are accustomed to having efficient indexing in collections and never really deal with linked lists. (I don't remember ever using a linked list in JS to be honest)

[Redefining] the square bracket syntax to mean array instead of list...would break with the ML family. But like @evancz pointed out in his talk, the goal is not to make Haskell or Standard ML more usable and more maintainable. The goal is to make Javascript more usable and more maintainable. Javascript has Arrays. JS people know arrays, not linked lists. We can cater to JS people by having square brackets mean Arrays.

I have encountered a significant number of capable JS programmers who did not study Computer Science in school, and as such have never even had occasion to use a linked list. Linked lists are not terribly hard to explain, of course, but their use by default does add fuel to the "Elm seems alien" fire.

3. Future Compilation Targets

Although this is not happening any time in the near future, it is worth thinking about how this proposal would affect Elm's long-term compilation target prospects.

One of the compilation targets with the most potential is machine code (or WebAssembly). There's no way to benchmark that yet, since Elm does not currently compile to these targets, but we do know that mandatory JS Object overhead will disappear, and that arrays enjoy significant performance benefits from locality over linked lists. Of course, we would need to run (future) benchmarks to ascertain the real impact.

z5h commented 9 years ago

I disagree with point 2.

JS people are accustomed to having efficient indexing in collections and never really deal with linked lists.

I don't think that's important. In Elm/FP, lists are usually filtered/mapped/folded. I've yet to look up an element by index! Most JS people know map and filter from underscore and the like.

Also, there is no reason for someone coming to Elm to know or care how a list is implemented. Just call it a list. Let them do an inefficient update or lookup by index (depending on the underlying data-structure). They won't notice a difference or care. Later, when they are building massive apps and are better versed in Elm/FP, they can dig into the docs and learn about a linked list or Relaxed Radix Balanced Tree.

Finally, I see a lot of assumptions made about JS (or Ruby) programmers. Let's once and for all state and agree for the record what we are assuming about them. Are they smart? Are they knowledgable? Can they learn? Can they read? Do they have time to learn something new? I think the only proper thing to assume is: They are not knowledgeable about FP. They are smart and can learn as long as they aren't overwhelmed by too many alien concepts at once.

I'm not against the other points. I want to make sure we are giving JS devs adequate credit. It will hurt everyone if we don't.

mgold commented 9 years ago

Although I've seen my opinion sway with each comment, I think @z5h articulates something I was trying to get at earlier:

In Elm/FP, lists are usually filtered/mapped/folded. I've yet to look up an element by index!

Indexing is a relic of C for-loops because that was (is) how assembly works. Most of the JS I write uses D3 and therefore has no explicit loops. I actually had to write one the other day, and had to remember how to do it!

I agree that many JS programmers (and Ruby programmers) are familiar with the basic higher order functions on lists/arrays. I also think most of them know what a linked list is - despite the rise of programmer bootcamps, most people to have a CS degree.

rtfeldman commented 9 years ago

I see a lot of assumptions made about JS (or Ruby) programmers. Let's once and for all state and agree for the record what we are assuming about them...I think the only proper thing to assume is: They are not knowledgeable about FP.

I generally assume:

JS programmers are working in industry, on a team of people with similar programming backgrounds.
JS programmers are familiar with JS, but not with FP in general or Elm in particular.
JS programmers are both open to the idea that learning Elm could be a good use of their time, and are capable of doing so. (This doesn't generalize 100%, but it would be definitionally pointless to cater to those who don't fit this description.)
The less familiar Elm is to JS programmers, the less likely they are to expend the time and energy required to try, then learn, then use it.
JS programmers who decide they would personally like to adopt Elm are aware that they will have to persuade their teammates that it is a good idea, and this is a significant factor in whether they will ultimately adopt it.

rtfeldman commented 9 years ago

In Elm/FP, lists are usually filtered/mapped/folded. I've yet to look up an element by index!

I agree that this is a minor point, but it's relevant to note that when you're translating code from JS to Elm, it's more likely that the original code used indexed lookups because there was less of a downside then.

For example, I write JS in a functional style, but when I had some code that needed to transform a list by considering adjacent elements (e.g. "if the next or previous elements fit this description, we get a different result...") I used indexed lookups because that was cheap and had no downside. When I ported that code to Elm it ended up making more sense to implement last and getAt for List than to scratch-reimplement all that code instead of just translating it.

Again, agree that this is not a huge deal, but the status quo still seems more likely to cause minor headaches than the proposal would.

rtfeldman commented 9 years ago

there is no reason for someone coming to Elm to know or care how a list is implemented. Just call it a list. Let them do an inefficient update or lookup by index (depending on the underlying data-structure). They won't notice a difference or care. Later, when they are building massive apps and are better versed in Elm/FP, they can dig into the docs and learn about a linked list or Relaxed Radix Balanced Tree.

Assuming this premise, why call it a list? Why not just call it an Array, especially if that's the more familiar term?

z5h commented 9 years ago

@rtfeldman Thanks for the point form assumptions. Good points to consider.

When I ported that code to Elm it ended up making more sense to implement last and getAt for List than to scratch-reimplement all that code instead of just translating it.

Right, so we can choose the List implementation that makes most sense to seasoned Elm/FP users. (Let's assume Linked list for argument's sake.) Next, add all the functions an array user would expect (even if they are expensive for that list implementation). Those writing small beginner apps won't particularly care that last/length are O(n). Those writing larger apps or those concerned about performance are the same people who are likely to dig into docs.

If we decide Array implementation in better for seasoned Elmers, the similar arguments will hold. Seasoned Elmers will know to import/use LinkedList if they are doing lots of consing.

In either case, I think the right approach is choose what is best for the seasoned Elmers and make sure new users don't have to think about it until they choose to.

As for naming, I think Array implies a more specific implementation and performance characteristics whereas List does not. And everyone (even non programmers) know what a list is.

TheSeamau5 commented 9 years ago

Those writing small beginner apps won't particularly care that last/length are O(n).

I think this is an important point to consider in this decision. Elm is also meant to be beginner-friendly and one must consider the learning path in Elm. What are the first concepts that one wishes to stress in learning? What is the first data structure you wish a beginner to learn? Given the convenience of the square bracket syntax, the data structure that will inherit this syntax will necessarily be the first one taught to kids and beginners.

Let us consider a specific case from the Elm examples, and an algorithm that is always taught in the first data structures class: quicksort

Here is the code for quicksort on lists from the Elm examples

quicksort : List comparable -> List comparable
quicksort list =
  case list of
    [] ->
        []

    pivot :: rest ->
        let
          lower  = filter (\n -> n <= pivot) rest
          higher = filter (\n -> n >  pivot) rest
        in
          quicksort lower ++ [pivot] ++ quicksort higher

And now, here is the code for quicksort on arrays with the square bracket syntax (and assuming that ++ would be extended to work on arrays)

quicksort : Array comparable -> Array comparable 
quicksort array = 
  case split array of 
    Nothing ->
       [] 

    Just (pivot, rest) -> 
        let 
          lower  = filter (\n -> n <= pivot) rest     
          higher = filter (\n -> n >  pivot) rest        
        in 
          quicksort lower ++ [pivot] ++ quicksort higher

where split is defined as follows

split : Array a -> Maybe (a, Array a)
split array = 
  case get 0 array of 
    Nothing -> 
      Nothing

    Just x -> 
      Just (x, slice 1 (length array) array)

The question, which I'd like to put to an actual CS teacher, is : which version is easier to teach to someone completely new to programming? Is there a significant pedagogical difference in having the pattern matching on the data structure itself versus on the result of invoking a function? How would teaching one over another influence the sort of material that is taught and in the patterns that are stressed? For example, if arrays are taught first, consing may be given almost no importance at first.

I have absolutely zero experience in teaching, so please take anything in this respect with a truckload of salt.

evancz commented 9 years ago

Meta Comment: I feel like we have been doing a lot of talking and theorizing on elm-plans. That's fun and easy, but I worry that we can collectively be more directed in how we use this.

In this case, I feel like having the discussion is kind of crazy if we don't know what the benchmarks say. The goal is "make rendering stuff faster". Well, we should know how much faster before we do any more work. Maybe it's not a big deal, maybe it is huge. Maybe it can be done with a different API on virtual-dom? Maybe this is not actually important after all? Measure it. Try things out. Once we know this is actually worthwhile, we can look into if it makes sense as a language feature in the first place. It is actually possible to weigh "how does this go for learners" against "100x speed up" or "2x speed up". I don't see how we can make any solid evaluations without the other side of this.

Also, I don't want this to become a magic wishlist where "Evan will someday do what I desire." This stuff takes work. Defining new work for me maybe is a nice thing, but if we want this stuff to happen faster, we must systematically find ways to break these things down into pieces that can be worked on by many people. In this case, it's pretty clear. Run benchmarks, build examples. If it's a compelling case, I should hear about the result.

So can I ask that we focus more on examples, packages, prototyping, benchmarking, testing, etc. in these threads. That stuff is all raw data that we can build on and none of it blocks on anyone else.

(Hopefully I don't sound like a big jerk here. I just think we can be more efficient in achieving our goals.)

evancz commented 9 years ago

I recently heard this phrase: "We can talk about making a difference, we can make a difference, or we can do both." I feel like elm-plans needs more balance.

Maybe we can reformulate this for Elm plans so we can require people opening issues to do more background work. Or have a clear process to independently get that background work done. Point is, maybe lets put a hold on all these discussions until we have clear realistic goals about what elm-plans and for and a process that people can follow to reach those goals.

TheSeamau5 commented 9 years ago

@evancz How about we define the process to make enhancements to Elm?

We can copy the way the Dart folks do things, for example. They seem to have an open and rigourous process in place.

See: https://github.com/dart-lang/dart_enhancement_proposals

I think it's really important to the define how a proposal goes from being an idea in someone's head to actually shipping. What work is required from the person making the proposal? What work is required from Evan (and/or anyone working on implementing language features)? What should happen to proposals that don't meet the standards set? What kind of testing/benchmarking should be required? etc...

texastoland commented 9 years ago

My friend at Google remarked if members had to implement a working draft with proposals then there'd be fewer proposals and more contributors able to fix bugs. Sounded snarky but scalable.

TheSeamau5 commented 9 years ago

@dnalot I don't think it's snarky. I think it's honest. I guess it'd depend on how @evancz sees things.

A language like Dart, for example, is made mostly with seasoned programmers in mind. Their target audience are large companies with 100K+ LOC and/or game companies who want to ship a big game using web technologies. A language like that has to require super high filters for enhancements or else they would go nowhere.

On the other hand, I imagine Elm being more flexible that this as it appeals also to beginners. I think it would make sense for Elm to seek the advice of educators and feedback from beginners to see where are the pain points (be they in the syntax, the libraries, etc...) in learning Elm.

Also, from what I can tell, Elm doesn't have Google's resources. As such, most people helping out with Elm (like me), do so from our own free time. This free time varies wildly from week to week and from person to person. On the other hand, the Dart folks have a dedicated team, paid to work on features of the Dart language, extending it as necessary, fixing bugs, writing new virtual machines, etc... all with the full backing of Google. So, I think it's fair for the Google folks to expect working drafts for each and every proposal.

But all this is just my opinion. Anyways, let's move this discussion to another thread. After all, this thread is about a specific proposal.

evancz commented 9 years ago

We can revisit this if someone does benchmarking.

elm-lang / elm-plans