elm-lang / elm-plans

no longer in use, just want to keep discussions around
BSD 3-Clause "New" or "Revised" License
29 stars 1 forks source link

Proposal: Add Array, Dict, Set Literal Syntax #12

Closed TheSeamau5 closed 9 years ago

TheSeamau5 commented 9 years ago

Inspired by Clojure's collection literals, I propose to add support for Array, Dict, and Set literals (analogous to List syntax).

Proposed syntax

Array

array = 
  #[1, 2, 3]

Rationale: Looks like a list but with a hash sign.

Dict

dict = 
  #{ "key1" : "value1"
   , "key2" : "value2"
   , "key3" : "value3"
   }

Rationale: Looks like Json. In spirit, Dicts and Json serve the same purpose--to store key-value data.

Set

set = 
  #{1, 1, 2, 3, 2}

Rationale: While we're at it, why not? I mean... Clojure does it.

In all cases, the main goal of the syntax is to encourage the usage of these datastructures, which have advantages that linked lists don't. Arrays have fast access, setting, and efficient appending to the end. Dicts are perfect for storing key-value data. Sets guarantee that there are no duplicates (therefore, no need to filter out all duplicates from a list).

Currently, if you want to write a DSL in Elm (like elm-html), lists and records are your only tools. The downside of this is that lists may not be the optimal data structure for your DSL.

For example, here is how elm-html would look with arrays and dicts

div 
  #[ style #{ "position" : "absolute" , "color" : "red" } 
   , id "containerDiv" 
   ]
  #[ ul #[]
      #[ li #[] #[text "Hello"]
       , li #[] #[text "World"]
       ]  
   ] 
texastoland commented 9 years ago

I dislike this. More syntax is bad. Specifically, I almost never use sets on the job. Moreover, since our target user is accustomed to using dictionaries for everything, convenient record syntax pushes them to think typed-ly.

toastal commented 9 years ago

Why not both?

It's just sugar--and a good, terse sugar that other languages are already using. I don't see why you couldn't have still have Set., Dict., Array. inserts/froms for backwards compatibility or whatever your stylistic conviction may be. I feel like I'd be more inclined to use other (read: better) data structures if the API was this way. Maybe you can rope in some Clojure/Script users in as well...

mgold commented 9 years ago

Regarding the dict example: four quotes on a line is pretty ugly. Without endorsing the proposal, I think it should be = not :.

I think the more fundamental concern here is, what problem are we trying to solve? If it's "we have all these great data structures but no one uses them," then literals are only part of the problem. You'll also need

TheSeamau5 commented 9 years ago

@mgold Yes, that is exactly the problem I'm trying to solve. We have amazing data structures and no one uses them. The reason I looked at Clojure as an example is because, as a Lisp, it would sound a-priori unwise to extend the Lisp syntax to include stuff like curly braces and square brackets. But, that bet paid off as vectors (equivalent to Elm Arrays) are preferred to lists as a means to store sequential data.

As regards the actual Array, Dict, Set implementations, I think it would be worth considering either strengthening these implementations by thorough testing, performance tweaks, etc... Or, we could consider switching to something like Immutable.js or Mori. But that's a discussion for another day I fear.

@dnalot About the target user. If I understand correctly, JS users are a big part of Elm's target. JS people are accustomed to having efficient indexing in collections and never really deal with linked lists. (I don't remember ever using a linked list in JS to be honest) As such, we could consider a stronger form of the proposal specifically to target JS people.

The stronger form

Redefine the square bracket syntax to mean array instead of list (and find some other syntax for lists). Yes, this would break with the ML family. But like @evancz pointed out in his talk, the goal is not to make Haskell or Standard ML more usable and more maintainable. The goal is to make Javascript more usable and more maintainable. Javascript has Arrays. JS people know arrays, not linked lists. We can cater to JS people by having square brackets mean Arrays.

array : Array Int
array = 
  [1..10]

We can then figure something out for lists.

Apanatshka commented 9 years ago

I like the premise, we should use more data structures than just lists. I'm not sure this literal syntax is the way to go though. If we expand the literals syntax, I think we should have a general way to do so, not specialise for some stuff that seems nice right now.

evancz commented 9 years ago

I think this is a nice proposal, great work @TheSeamau5! I think it's kind of sad that # is an ugly thing to be putting in so many places, but I think the HTML example is pretty compelling.

My first intuition was the same as @Apanatshka's: "is there something general happening here?" Perhaps not. Maybe that way is a quagmire of things we don't actually care about. Maybe if our core data structures are easy to use, then we are all set.

I also agree with @mgold that = is better. If we ever allow type annotations in expressions, the dict vs set one will become sooooo ambiguous. It's already pretty rough as is though, not sure how plausible it is to parse that efficiently.

The OCaml syntax for arrays is [| 1, 2, 3 |] which I didn't add because it always seemed kind of meh as syntax. I like #[ 1, 2, 3 ] more now that I see them both, but I wonder if we can do a bit better.

evancz commented 9 years ago

So maybe the way forward is to get a small set of example programs that use dicts and sets and lists and arrays and try out variations of this idea til we are satisfied. Perhaps like with the tasks stuff, someone shares a gist, others make variations, we list them all up as an addendum to the initial issue.

evancz commented 9 years ago

What if it is somehow related to modules, kind of like #5 but not:

import Array as A
import Dict as D

div 
  A.[ style D.{ "position" = "absolute" , "color" = "red" } 
    , id "containerDiv" 
    ]
  A.[ ul A.[]
        A.[ li A.[] A.[text "Hello"]
          , li A.[] A.[text "World"]
          ]
    ]

Haha, def looks worse to me! Okay, I'm gonna get back to components stuff now, but again, this seems quite interesting!

TheSeamau5 commented 9 years ago

Here's one example, from elm-todomvc

Current version with linked lists:

view : Address Action -> Model -> Html
view address model =
    div
      [ class "todomvc-wrapper"
      , style [ ("visibility", "hidden") ]
      ]
      [ section
          [ id "todoapp" ]
          [ lazy2 taskEntry address model.field
          , lazy3 taskList address model.visibility model.tasks
          , lazy3 controls address model.visibility model.tasks
          ]
      , infoFooter
      ]

Current proposal: (colon replaced with equals)

view : Address Action -> Model -> Html
view address model =
    div
     #[ class "todomvc-wrapper"
      , style #{ "visibility" = "hidden" } 
      ]
     #[ section
         #[ id "todoapp" ]
         #[ lazy2 taskEntry address model.field
          , lazy3 taskList address model.visibility model.tasks
          , lazy3 controls address model.visibility model.tasks
          ]
      , infoFooter
      ]
rtfeldman commented 9 years ago

Great idea @TheSeamau5!

I think a good way to go for this would be to split it up into separate proposals for "Separate Syntax for Array Literals and List Literals" vs. others. I think the motivation for Array vs. List is pretty strong, because it's so common to want to use literals for those in DSLs.

The use case that comes immediately to mind is elm-html's element functions; I think it's safe to say the primary reason they accept a List of attributes over an Array of them is not that List is the best data structure to hold them (quite the opposite!) but rather because [attr1, attr2, attr3] is the nicest DSL to work with.

Every time I've wanted to construct a Dict or Set so far, it's been by starting with some other variable-length data structure and transforming it, so I'm actually not sure I'd even have used literals for them if they existed! That's not to say my experience generalizes, but rather that I'm not sure there's as strong a case to be made there.

z5h commented 9 years ago

It seems unfortunate to me that we are binding specific implementations of a listy things to syntax. As soon as we want a different 'listy' thing (a skip list, a doubly linked list, etc) we'll run into this problem again.

Might there be a way to create a listy thing and (optionally) assign it's implementation after?

mgold commented 9 years ago

@z5h That's the problem type classes solve, and we're not going there right now. Or, if we decide we need to, this conversation balloons.

@ everyone: Correct me if I'm wrong, but lists are much more performant than arrays. Arrays of less than 32 elements are basically single chunks of memory that must be copied entirely on write, and this is the use case for the vast majority of DSL uses. FP typically uses mapping instead of indexed traversal, because it works so well with immutability.

Or, when you're writing out a literal, you don't really need to index into it. And the HTML library doesn't care what data structure you're using since it just maps over it to get the complete set of attributes, children, or what have you. Sure, you might use indices to construct something dynamically, but then you're no longer writing literals. So I don't agree with Richard (at the moment) that lists are "far from" the best data structure for the HTML library.

Point is, I agree with Evan on code examples. I'd like to see something really ugly with lists, ugly with lists converted to arrays, and beautiful with a made-up array literal syntax.

Apanatshka commented 9 years ago

The module prefix doesn't look so bad to me @evancz. That's the direction I was thinking in too, reminiscent of one of the tasklet macro syntaxes with a record for, in this case, cons and nil or something like that. You could also try a variation of the Haskell quasi-quoting again like we had for markdown and still do for glsl:

view : Address Action -> Model -> Html
view address model =
    div
     [A| class "todomvc-wrapper"
     , style {D| "visibility" = "hidden" |} 
     |]
     [A| section
         [A| id "todoapp" |]
         [A| lazy2 taskEntry address model.field
         , lazy3 taskList address model.visibility model.tasks
         , lazy3 controls address model.visibility model.tasks
         |]
     , infoFooter
     |]
TheSeamau5 commented 9 years ago

@mgold I didn't give the example of elm-html to claim that arrays were superior to lists. What I mean, is that there a number of cases where they are the better structure.

For example: Infinite scrolling lists. You need a collection that can store all that information, send actions uniquely to the concerned element, and slice the collection appropriately to only display what is visible. Arrays beat lists on all these fronts.

While you can implement the described behaviors with Arrays today, you end up paying a needless conversion to lists. Why? It's completely unnecessary.

Again, in the particular context of elm-html, I am not advocating the change from lists and tuples to arrays and dicts. What I am pointing out is what @rtfeldman pointed out. The reason elm-html took the routes of lists and tuples is not due to a calm, thoughtful analysis of performance tradeoffs. It is uniquely because lists and tuples have associated literals. Arrays and Dicts do not. Currently, if you want to make a DSL, using Arrays or Dicts or Sets are out of the question, even if they would be ideal data structures for these DSLs.

TheSeamau5 commented 9 years ago

@Apanatshka Sounds like an interesting idea to find a more general pattern, but also sounds super complex in the short term.

That said, I would be interested in reading a proposal that could explain how these generalized data structures would work and how one would be able to define their own custom data structures.

texastoland commented 9 years ago

@evancz's module syntax looks intriguing to me to :+1: Swift features literal convertible protocols for transforming standard types into custom ones.

evancz commented 9 years ago

Making things implicit is tricky in my experience. This exists in Haskell for strings as OverloadedStrings, and it's always been super confusing when I use it. Apparently they have a mechanism to do this with lists as well (OverloadedLists). It's possible, there's a formula, but I don't think this is a nice place to be in many ways.

I am generally in agreement with @TheSeamau5 that this is going to get out of control if we try to solve every case, when in fact we have a very specific issue: whether it is elm-webgl or elm-html or Graphics.Element, arrays are faster to use and impose no real costs on people in an ideal world where the syntax is cheap.

Meta Note: I think we outlined pretty much all the general approaches, so to move this forward, share further ideas as a gist with a bunch of examples, similar to how we did things here. We can collect all the ideas and add them in an addendum to the initial issue. I'll try to do better too :) I think @TheSeamau5 already made a good case, but for the others, I think we should see them fully layed out.

evancz commented 9 years ago

Are there languages besides Clojure that do this? If so, what syntax do they use?

evancz commented 9 years ago

Sorry to keep cluttering this issue, but I just remembered VLists which @rtfeldman told me about recently. It might be worthwhile to open another issue exploring this, or better yet, for someone to just make it as a community library.

rtfeldman commented 9 years ago

Correct me if I'm wrong, but lists are much more performant than arrays. Arrays of less than 32 elements are basically single chunks of memory that must be copied entirely on write, and this is the use case for the vast majority of DSL uses. FP typically uses mapping instead of indexed traversal, because it works so well with immutability.

Unfortunately the coefficient for linked lists in JS is huge because each Cons cell is a fully-fledged JS Object instead of just a couple of pointers. :crying_cat_face:

mgold commented 9 years ago

Time to benchmark, then.

BartAdv commented 9 years ago

Just wanted to add I'm with @z5h on this one. Binding to one specific implementation seems too limiting. In a healthy ecosystem it might be the case you'd like to use other structures than ones provided by core.

evancz commented 9 years ago

I'm in the process of closing down this repo. This idea is cool. I have it in my personal set of "things to consider" so I will revisit it before 1.0

evancz commented 9 years ago

If you want to preserve this, move it to a gist. I will probably delete this repo in a couple days and salvage certain pieces (like the CONTRIBUTING page) for managing other repos and online venues.

masklinn commented 8 years ago

Are there languages besides Clojure that do this? If so, what syntax do they use?

It's a bit late but since I saw this issue and nobody replied

The "module" option sounds similar to initializer lists (C/C++/C#), maybe desugared at compile-time so dictionaries can be source encoded as pair sequences or as proplists without needing their own custom syntax?

An other similar idea is Python and C#'s string literal modifiers (in Python3 "foo" is a regular string, b"foo" is a bytes object, and r"foo" is a rawstring where most escape sequences are ignored, C# has @"foo" for the latter which it calls verbatimg string literals and $"foo" for interpolated string literals). ES6 added tagged template literals tagfoo`` where a template literal is preprocessed by an arbitrary tag function.

reitzig commented 7 years ago

Just to leave one use case on this zombie: representing JSON-style data.

While representing JSON with strong type systems is generally ... interesting, let's say we manage to define an Elm type for the JSON at hand. If it's not completely trivial, it will probably contain some instances of Dict (e.g. for capturing patternProperties keys, to speak in JSON schema terms). Now if we want to include such data statically in an Elm app (I guess that applies to non-JSON data as well, but that's my use case), what code do we use?

Going via Dict.fromList feels wrong, both w.r.t aesthetics and (potentially) efficiency.