avh4 / elm-format

elm-format formats Elm source code according to a standard set of rules based on the official Elm Style Guide
BSD 3-Clause "New" or "Revised" License
1.31k stars 145 forks source link

"Chaining" style #568

Open rtfeldman opened 5 years ago

rtfeldman commented 5 years ago

Motivation

There are a few longstanding issues with elm-json-decode-pipeline. I wrote them up here.

(Briefly: reordering fields in type alias can break its decoders' implementations, its unusual types are detrimental to learning and to error message quality, and customizing its decoders is challenging.)

Solution

I wrote a Discourse post looking for feedback on an experimental API that addresses these problems.

@albertdahlin responded that his team independently came up with essentially the same API and have been using it at work for months, with positive results:

In my company we switched to using this style some months ago and we like this style better. It produces code that is easier to understand, and it also avoid bugs when reordering/adding fields in a record.

Their code for it is here.

Proposal

Add experimental (as in, elm-format@exp) support for this formatting style:

type alias User =
    { id : Int
    , email : String
    , name : String
    }

decoder : Decoder User
decoder =
    require "id" int <| \id ->
    require "email" string <| \email ->
    default "name" string "Guest" <| \name ->
    succeed { id = id, email = email, name = name }

Specifically, the proposed formatting rule change would be:

If a line ends in <| followed by an anonymous function, do not insert a mandatory newline between them, and do not indent the next line.

This would not change the formatting of any existing code. It would be a new formatting choice that has not been supported before, and it would specifically be added to support this "chaining" style of DSL. (One way to think of it is as “do notation without the syntax sugar.“)

rtfeldman commented 5 years ago

Implications

Currently the example in the issue is formatted like so:

decoder : Decoder User
decoder =
    require "id" int <|
        \id ->
            require "email" string <|
                \email ->
                    default "name" string "Guest" <|
                        \name ->
                            succeed { id = id, email = email, name = name }

This style is in widespread use in elm-test, which suggests it should still be supported. This proposal would mean people writing elm-test tests have an additional formatting choice to consider.

To illustrate this choice, here's an example from FloatWithinTests.elm in elm-test's test suite:

floatWithinTests : Test
floatWithinTests =
    describe "Expect.within"
        [ describe "use-cases"
            [ fuzz float "pythagorean identity" <|
                \x ->
                    sin x ^ 2 + cos x ^ 2 |> Expect.within (AbsoluteOrRelative 0.000001 0.00001) 1.0
            , test "floats known to not add exactly" <|
                \_ -> 0.1 + 0.2 |> Expect.within (Absolute 0.000000001) 0.3
            ]
        ]  

I chose this because it's a real-world test suite that uses a newline (and indentation) after the anonymous function (namely \x ->) as well as a newline after the <| but not after the anonymous function (namely \_ ->).

This proposed change would introduce a third option: no newline after the <| (currently a newline there is mandatory if you want the test body on a different line), but still having a newline after the anonymous function.

It would mean the above could be optionally written like this instead:

floatWithinTests : Test
floatWithinTests =
    describe "Expect.within"
        [ describe "use-cases"
            [ fuzz float "pythagorean identity" <| \x ->
              sin x ^ 2 + cos x ^ 2 |> Expect.within (AbsoluteOrRelative 0.000001 0.00001) 1.0
            , test "floats known to not add exactly" <| \_ ->
              0.1 + 0.2 |> Expect.within (Absolute 0.000000001) 0.3
            ]
        ]

Permitting this does not seem like a cause for concern to me.

Support

As noted in the issue, @albertdahlin's team has had a positive experience using this decoding API (without elm-format) for months.

@danabrams commented:

I like it a lot for deciding large, sometimes complex Json.

@brasilikum commented:

As feedback from beginners was explicitly requested: I understood the example code right away and had an “aaaahh” moment because:

This style of writing decoders resembles “normal” elm code much more than the current [that is, Decode.Pipeline] syntax. It was intuitively obvious that I now had access to all previously decoded values, just like I do while pattern matching.

I have written a small elm app for internal tooling so far but did not understand the current syntax enough (even after three tries) to be able to decode a type where one value depends on two others.

@ianmjones found it confusing at first:

For someone relatively inexperienced with Elm like me, the nested anonymous functions syntax of the proposal seems fairly confusing. Then again it may be down to the back pipe usage, always gives me headaches!

I added a TEACHING.md file which explained how it worked, and this turned it around:

That new TEACHING doc was very helpful, it nicely explains each of the parts of the pattern as it progresses to build the final syntax. Thank you!

(I'm considering that a "support" because historically I've had little success teaching what Decode.Pipeline is actually doing from a type perspective.)

Pushback

@runarfu said:

It seems tedious having to repeat the field names (or name the intermediate values something else).

@SergKam echoed this concern:

I don’t really like it. It can be useful for some cases but [...] for me, Elm is already too verbose I would try to keep it cleaner.

@AlexKorban expressed a strong preference for Decode.Pipeline:

In the experimental API, the name of each field appears 4 times (or 3 times if using the record constructor) instead of once with the existing API. Particularly in the case of JSON objects with many fields, this will be very noisy. [...] As @joelq pointed out, fields can still be swapped if people use a constructor with succeed (and it’s likely that they will in order to avoid typing field names yet again).

Other Feedback

@sporto expressed a preference for the indented style when decoding a small number of fields, but considered the non-indented version beneficial for larger ones:

I like this for the safety this provides. We had something very similar before Elm 0.19 using a custom operator [...] This was great for making our code resilient to the issue of having attributes with the same type.

I find the indented version much easier to read in the examples. [...] At first the non indented version is a lot more difficult to parse, there is a lot happening in a packed space. Then after a few looks is not too bad, the benefit of not having runaway indentation is huge. Still not a great API in Elm unfortunately.

@joelq commented that users are likely to continue to use record constructor functions in this API regardless:

Using a lambda to construct a literal record fixes the issue regardless of whether you’re using core decoders, pipeline decoders, or this experimental approach. In all three cases, users are likely to use the auto-generated constructors for large records.

I responded with a clarification.

@chadtech compared the error messages of both styles and concluded that they were not much different in quality. I came to the opposite conclusion and explained why.

@rgrempel pointed out that "whether this is an idiom that ought to be generally encouraged is a significant question." I think that's worth discussing in this thread.

rgrempel commented 5 years ago

If I had to give a quick description of this idiom, it would be "gradually bringing identifiers into scope without monotonous indentation".

As a general matter, I think this is most essential when dealing with multiple andThen computations. It is the andThen which requires that we introduce a new scope -- which would normally mean a new level of indentation. However, it's perfectly normal to do a bunch of andThens. That is, in effect, what @rtfeldman's example is doing behind the scenes, since require and default are defined using andThen.

So, with a bunch of andThens, you'd normally end up with a cascading indentation. Now, normally this is fine -- the indentation has a purpose, since it visually shows you what is in scope and what is not at any point. However, when the indentation is increasing monotonously, it's both ugly and causes difficulty with line length.

Now, Elm 0.17 flipped the argument order of andThen in order to encourage us to put andThen in pipelines, rather than the idiom used here. (Which, of course, is exactly what elm-json-decode-pipeline does now, as implied by its name). The difficulty is that each step in the "standard" pipeline gets its own scope. That is, you're not gradually introducing new variables one at a time into a shared scope. Instead, you get a separate scope for each line.

So, for instance, suppose the API were such that you could write this (in post-Elm-0.17 style)

require "id" int
    |> andThen (\id -> require "email" string)
    |> andThen (\email -> default "name" string "Guest")
    |> andThen (\name -> succeed { id = id, email = email, name = name })

... then the difficulty is that id and email aren't actually in scope once you get to the end.

Now, elm-json-decode-pipeline gets around this by passing a function into the beginning of the pipeline, but that leads to the difficulties that @rtfeldman mentions.

To keep everything in scope at the end, you'd have to re-bracket, something like this ...

require "id" int
    |> andThen
        (\id ->
            require "email" string
                |> andThen
                    (\email ->
                        default "name" string "Guest"
                            |> andThen (\name -> succeed { id = id, email = email, name = name })
                    )
        )

... which is clearly not nice, at least as it would be formatted currently.

In other languages, this general problem of "gradually bringing variables into scope without excessive indentation" is handled via special syntax -- for instance, "do notation". For instance, decoder might look something like this in Haskell or Purescript:

decoder : Decoder User
decoder =
    do
        id <- require "id" int
        email <- require "email" string
        name <- default "name" string "Guest"
        succeed { id = id, email = email, name = name }

If you compare this to the proposal ...

decoder =
    require "id" int <| \id ->
    require "email" string <| \email ->
    default "name" string "Guest" <| \name ->
    succeed { id = id, email = email, name = name }

... what I find fascinating is how much of the benefit of "do notation" you can get from a simple formatting change.

However, there are a few questions that I think might be worth exploring (when thinking about this as a general idiom).

Usage with explicit andThen

The example relies on require and default using andThen internally. It would probably be worth looking at examples where the andThen is explicit (possibly flipped). I tried using with as a flipped andThen at https://github.com/rtfeldman/elm-json-experiment/pull/1, so you can see what kind of difference it makes ... it looks like this:

userDecoder : Decoder User
userDecoder =
    with (field int "id") <| \id ->
    with (field int "followers") <| \followers ->
    with (field string "email") <| \email ->
    succeed { id = id, followers = followers, email = email }

The difficulty here is that you need the brackets -- the example @rtfeldman gave avoids this by building the andThen into require and default themselves. Yet, I'm not sure it makes sense to think about this idiom only in the context of APIs that do that -- having a separate andThen function is sometimes essential, of course.

Using a forward-pipeline

Is it possible to get a sufficiently nice look for this idiom using a forward pipeline (i.e. |> rather than <|)? If you take my strangely-formatted example from above, you could imagine formatting it a little more nicely:

require "id" int |> andThen (\id ->
require "email" string |> andThen (\email ->
default "name" string "Guest" |> andThen (\name ->
succeed { id = id, email = email, name = name })))

... which is all right, I suppose, but probably not as nice as the <| version. There is something about the precedence or associativity of <| which makes it easier to avoid the brackets. Also, I think the formatting rule you can specify for <| is probably simpler than the formatting rule you would need for |>.

More complex stages

What happens to the proposed formatting if some of the stages are more complex? It would probably be worth sketching some of that out. Suppose, for instance, we need a let ... how ought this be formatted?

decoder : Decoder User
decoder =
    require "id" int <| \id ->
    require "email" string <| \email ->
        let
            defaultName =
                computeDefaultName id email
        in
        default "name" string defaultName <| \name ->
        succeed { id = id, email = email, name = name }

I think what I've done there is roughly what this proposal would do. In "do notation", there is actually a bit of sugar for this as well, something like:

decoder : Decoder User
decoder =
    do
        id <- require "id" int
        email <- require "email" string
        let defaultName = computeDefaultName id email
        name <- default "name" string defaultName
        succeed { id = id, email = email, name = name }

However, I don't think there's any way to "simulate" that with mere formatting -- at least, I haven't come up with anything.

Anyway, I like the proposal (for whatever that's worth) ... just wanted to suggest some additional context for it.

rgrempel commented 5 years ago

Actually, perhaps I have the "more complex stage" example wrong -- perhaps the proposal would result in this instead:

decoder : Decoder User
decoder =
    require "id" int <| \id ->
    require "email" string <| \email ->
    let
        defaultName =
            computeDefaultName id email
    in
    default "name" string defaultName <| \name ->
    succeed { id = id, email = email, name = name }

That is, we wouldn't indent let in this case, which is nice, I think.

JoelQ commented 5 years ago

As an extra data point, someone was asking about chaining a bunch of HTTP requests and accumulating all the results at the end. The style under discussion here was one possible solution mentioned, as an alternative to a cumbersome nesting of andThen, lambda's, and parens.

albertdahlin commented 5 years ago

Our package webbhuset/elm-json-decode using the continuation style is now published.

Chadtech commented 5 years ago

Hello! This was a really interesting idea. Im warming up to it more and more. At our most recent Elm Munich meet up, I briefly showed this issue, as it became related to our discussion. It generated two comments that I think could be summed up as "I dont like this format, its not readable, I would never think to look for the parameter there" and then one reply "well you are just saying that because you arent used to it".

rlefevre commented 4 years ago

A variant is actually used in elm/core Task.map* functions.

For example:

map5 : (a -> b -> c -> d -> e -> result) -> Task x a -> Task x b -> Task x c -> Task x d -> Task x e -> Task x result
map5 func taskA taskB taskC taskD taskE =
  taskA
    |> andThen (\a -> taskB
    |> andThen (\b -> taskC
    |> andThen (\c -> taskD
    |> andThen (\d -> taskE
    |> andThen (\e -> succeed (func a b c d e))))))

which is currently formatted by elm-format as:

map5 : (a -> b -> c -> d -> e -> result) -> Task x a -> Task x b -> Task x c -> Task x d -> Task x e -> Task x result      
map5 func taskA taskB taskC taskD taskE =      
    taskA      
        |> andThen      
            (\a ->      
                taskB      
                    |> andThen      
                        (\b ->      
                            taskC      
                                |> andThen      
                                    (\c ->      
                                        taskD      
                                            |> andThen      
                                                (\d ->      
                                                    taskE      
                                                        |> andThen (\e -> succeed (func a b c d e))      
                                                )      
                                    )      
                        )      
            )        
Janiczek commented 3 years ago

@avh4 (Repost from Slack; here the Slack history-eating monster has no power 😄 )

I'd like to ask about whether there are plans for implementing some variant of this PR to allow nice andThen chaining / a reasonable do syntax alternative. My usecase is compilers:

Canonical.If { test, then_, else_ } ->
    let
        ( test_, id1 ) =
            f currentId test

        ( then__, id2 ) =
            f id1 then_

        ( else__, id3 ) =
            f id2 else_
    in
    assignId id3
        (Typed.If
            { test = test_
            , then_ = then__
            , else_ = else__
            }
        )

could, if I hid the ID threading into a monad, become something like

Canonical.If { test, then_, else_ } ->
    f test <| \test_ ->
    f then_ <| \then__ ->
    f else_ <| \else__ ->
    assignId 
        (Typed.If
            { test = test_
            , then_ = then__
            , else_ = else__
            }
        )

but, as you can guess by people asking you about this time and time again, the elm-format-ed version looks so bad it effectively detracts me from ever using that monadic approach and makes me try different non-monadic solutions:

Canonical.If { test, then_, else_ } ->
    f test <|
        \test_ ->
            f then_ <|
                \then__ ->
                    f else_ <|
                        \else__ ->
                            assignId
                                (Typed.If
                                    { test = test_
                                    , then_ = then__
                                    , else_ = else__
                                    }
                                )

Which IMHO is a pity: I could use the "abstract state/writer/... away" benefit of monads and Elm has the feature set to make it possible. But because I'm using elm-format, I shy away from that solution-space altogether because it doesn't format nicely.

EDIT: Here is an Ellie with what I think illustrates the benefits of the proposed elm-format change: https://ellie-app.com/f83Y99bgJgDa1 (mainly if you compare snippets 3 and 4 with the snippet 5)