Rewrite 'dhall format' to be a Vonderhaar style pretty-printer

neongreen commented 4 years ago

This has already been discussed a bit at https://github.com/dhall-lang/dhall-haskell/issues/244, but I want to bring this up again.

I'd like it if dhall format preserved my intent to lay out an expression one a single line / on several lines.

I am making a small HTML templating system with Dhall – it's supposed to power monadfix.com, which is currently done with Mustache. The syntax is quite a bit more verbose than Mustache, but I can live with it if I get Dhall's benefits in return.

Both this time – and pretty much every time I tried Dhall before that – I kept wishing that dhall format would not transform multiline strings/lists into single-line strings/lists depending on how short/long they are.

Dhall-ised HTML is already putting a strain on me, so I want trees to at least stay trees when I write them as trees. Same with multiline strings – I growl every time I write a multiline string and it turns into something like \n\nLorem ipsum\n\n because it's not long enough.

Gabriella439 commented 4 years ago

So I don't know about lists, but it seems like we could easily have the formatter always prefer multi-line strings if the string has at least one newline in it

neongreen commented 4 years ago

I suspect this would be equally troublesome, just in the other direction. Don't have a good example though.

sjakobi commented 4 years ago

we could easily have the formatter always prefer multi-line strings if the string has at least one newline in it

How about at least one newline and at least one non-newline character? That way "\n" and "\n\n" are still formatted as a single-line strings.

Here is the relevant code BTW:

https://github.com/dhall-lang/dhall-haskell/blob/5ceb8d9d601421643f625670cb202057c274bc1f/dhall/src/Dhall/Syntax.hs#L357-L358

https://github.com/dhall-lang/dhall-haskell/blob/5ceb8d9d601421643f625670cb202057c274bc1f/dhall/src/Dhall/Pretty/Internal.hs#L1072-L1111

neongreen commented 4 years ago

NB:

even then that's not the way that the prettyprinter library works (that dhall-format is based on). prettyprinter just tries to format the expression in single-line form regardless of what the user originally tried to do and if the single-line form exceeds 80 columns then it uses multi-line form.

I would go as far as claim that getting rid of prettyprinter is a good thing.

The current behavior re/ line-wrapping is a bit too much of "spooky action at a distance". Here is a demonstration. Take a snippet that barely fits into 80 chars:

  λ(page : Page/Type)
→   node "html"
  ⫽ attrs (toMap { lang = "en" })
  ⫽ content
      [   node "head"
        ⫽ content
            [ node "meta" ⫽ attrs (toMap { charset = "UTF-8" })
            , Html.nodes
                ( List/map
                    Text
                    Html.Node
                    (   λ(link : Text)
                      → stylesheet
                          "non-breakable line barely fitting into 80 characters"
                    )
                    page.extraStyles
                )
            , Html.nodes page.extraHead
            ]
      , node "body"
      ]

Adding one more character to the non-fitting line causes many other lines to be rewrapped for no good reason:

  λ(page : Page/Type)
→   node
      "html"
  ⫽ attrs (toMap { lang = "en" })
  ⫽ content
      [   node
            "head"
        ⫽ content
            [   node
                  "meta"
              ⫽ attrs (toMap { charset = "UTF-8" })
            , Html.nodes
                ( List/map
                    Text
                    Html.Node
                    (   λ ( link
                          : Text
                          )
                      → stylesheet
                          "non-breakable line barely fitting into 80 characters!"
                    )
                    page.extraStyles
                )
            , Html.nodes page.extraHead
            ]
      , node "body"
      ]

I, for one, would (probably!) not miss line-wrapping if it was gone, and getting rid of it would mean (I think?) that we can get by without prettyprinter.

sjakobi commented 4 years ago

@neongreen The extra line rewrapping is due to the same bug you discovered in #1400. With my patch from https://github.com/quchen/prettyprinter/pull/89, your example is formatted like this:

  λ(page : Page/Type)
→   node "html"
  ⫽ attrs (toMap { lang = "en" })
  ⫽ content
      [   node "head"
        ⫽ content
            [ node "meta" ⫽ attrs (toMap { charset = "UTF-8" })
            , Html.nodes
                ( List/map
                    Text
                    Html.Node
                    (   λ(link : Text)
                      → stylesheet
                          "non-breakable line barely fitting into 80 characters!"
                    )
                    page.extraStyles
                )
            , Html.nodes page.extraHead
            ]
      , node "body"
      ]

neongreen commented 4 years ago

@sjakobi oh! This is good to know.

joneshf commented 4 years ago

I'll throw another vote in for dropping Wadler-style in exchange for Vonderhaar-style.

Vonderhaar-style is where a construct that is written over a single line stays a single line and a construct that is written over multiple lines stays multiple lines. It was popularized by the creator of elm-format, influenced purty, and now ormolu. It's a nice way to format computer languages nowadays.

I suspect this would be equally troublesome, just in the other direction. Don't have a good example though.

I agree 100%. Dealing with dhall format and Text literals is incredibly annoying. Changing the algorithm to prefer one way of reformatting Text literals over the other won't fix the problem.

ocharles commented 4 years ago

I'm a huge fan of the Vonderhaar style, thanks for teaching me the name of it!

Gabriella439 commented 4 years ago

I want to add another reason to possibly get rid of prettyprinter: it's currently becoming a performance bottleneck now that we've made a lot of progress optimizing on other fronts

sjakobi commented 4 years ago

I want to add another reason to possibly get rid of prettyprinter: it's currently becoming a performance bottleneck now that we've made a lot of progress optimizing on other fronts

Oh! I thought I had seen it take quite a bit of time formatting the normalized pkg-set.dhall from cpkg but I thought that that was a rather extreme case and an unusual thing to do. Do you have "real-world" expressions that take too long to format?

Gabriella439 commented 4 years ago

@sjakobi: Here's an example:

https://github.com/Gabriel439/dhall-kubernetes-charts/blob/aed07023b032f301b03e2acc1356c0f2144f574a/stable/jenkins/index.dhall

If you add a trivial type error to that module and try to type-check it with dhall-1.27.0 it spends quite a lot of time just rendering the (truncated) context due to pretty-printing being the bottleneck. This is what motivated the following change:

https://github.com/dhall-lang/dhall-haskell/pull/1482

sjakobi commented 4 years ago

Wow! That's quite bad indeed. Incidentally quchen/prettyprinter#89 would speed this up a bit, maybe 15-20%.

What really impressed me was how long the first type-checking took. I didn't time it, but I'm pretty sure it was more than 10 minutes. I suspect that most of that time was spent on import resolution, possibly downloading one file of dhall-kubernetes after another?! Has there been a discussion of possible speedups there before?

Gabriella439 commented 4 years ago

@sjakobi: It depends on the specific bottleneck but if the root cause is indeed downloading too many files then one solution would be to create a globally available caching forward proxy that can pre-resolve and pre-encode imports (plus language support for importing from a binary content type). Then users could add http_proxy=https://proxy.dhall-lang.org as an environment variable to take advantage of this shared proxy

sjakobi commented 4 years ago

@Gabriel439 I've made a new issue for this: https://github.com/dhall-lang/dhall-haskell/issues/1511.

neongreen commented 4 years ago

Another example, illustrating the problem better. The formatter makes this snippet look much messier than it could be:

            ⫽ content
                [   Html.node "li"
                  ⫽ content
                      [ text "Language questions ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content
                      [ text "Ecosystem questions ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content [ text "Tooling support ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content
                      [ text "Code review ", rightSideQuantity "20 hours" ]
                ,   Html.node "li"
                  ⫽ content
                      [ node "strong" ⫽ content [ text "+ Architecture review" ]
                      ]
                ,   Html.node "li"
                  ⫽ content
                      [ node "strong" ⫽ content [ text "+ Performance review" ]
                      ]
                ,   Html.node "li"
                  ⫽ content
                      [ node "strong" ⫽ content [ text "+ Security review" ] ]
                ]

Original:

            ⫽ content
                [   Html.node "li"
                  ⫽ content [ text "Language questions ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content [ text "Ecosystem questions ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content [ text "Tooling support ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content [ text "Code review ", rightSideQuantity "20 hours" ]
                ,   Html.node "li"
                  ⫽ content [ node "strong" ⫽ content [ text "+ Architecture review" ] ]
                ,   Html.node "li"
                  ⫽ content [ node "strong" ⫽ content [ text "+ Performance review" ] ]
                ,   Html.node "li"
                  ⫽ content [ node "strong" ⫽ content [ text "+ Security review" ] ]
                ]

When working with moderately nested structures, formatting consistency takes a huge hit as it approaches the 80-column boundary.

sjakobi commented 4 years ago

@neongreen That because some lines in the original are wider than 80 characters, right?!

Here's a reproducible version:

[ [ [ [ [ [   x
            ⫽ content
                [   Html.node "li"
                  ⫽ content
                      [ text "Language questions ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content
                      [ text "Ecosystem questions ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content [ text "Tooling support ", rightSideQuantity "∞" ]
                ,   Html.node "li"
                  ⫽ content
                      [ text "Code review ", rightSideQuantity "20 hours" ]
                ,   Html.node "li"
                  ⫽ content
                      [ node "strong" ⫽ content [ text "+ Architecture review" ]
                      ]
                ,   Html.node "li"
                  ⫽ content
                      [ node "strong" ⫽ content [ text "+ Performance review" ]
                      ]
                ,   Html.node "li"
                  ⫽ content
                      [ node "strong" ⫽ content [ text "+ Security review" ] ]
                ]
          ]
        ]
      ]
    ]
  ]
]

We could make the page width configurable if that would be useful for you. So you could use dhall format --page-width 100 to format with a page-width of 100 columns instead of 80.

BTW is that HTML DSL open source? Looks neat! :)

neongreen commented 4 years ago

100 chars will make this example better at the expense of making other things worse (stuff that I want to stay multi-line will become single-line).

My point is that there's no --page-width I could choose that wouldn't lead to inconsistent formatting.

The HTML DSL is here but it needs a renderer: https://gist.github.com/neongreen/9b0845789c62301e0cc978019556cbb2. I want to release it at some point but currently don't have time for it.

neongreen commented 4 years ago

(Node could be a recursive type but I cheated and used JSON instead because type checking was slow.)

sjakobi commented 4 years ago

@neongreen Regarding rendering: In https://github.com/dhall-lang/dhall-lang/issues/200 there's some discussion toward a generic pretty-printing library that could be useful for your renderer. Not sure how usable @matheus23's dhall-blocks is today…

(Node could be a recursive type but I cheated and used JSON instead because type checking was slow.)

I'm very interested in improving dhall's performance – so please do report such issues if you can! :)

sjakobi commented 4 years ago

Regarding the apparent demand for a Vonderhaar style pretty-printer: I agree that it could do a better job at formatting source code than the current pretty-printer. But I suspect that it might be difficult to get it to format generated code, such as the output of dhall type, well. That means we might end up with two pretty-printers, and I worry that that would take too much effort to maintain.

sjakobi commented 4 years ago

I should clarify: Building a Vonderhaar-style formatter for Dhall might still be a worthwhile effort. But if it should replace the existing pretty-printer, it needs to work well for generated code too. So maybe it makes sense to develop it as a separate tool.

neongreen commented 4 years ago

Good objection.

Do you have examples of generated code it has to work with?

Something as simple as a flag to "always use multi-line formatting for records and strings with newlines" might do the trick, but I don't know.

sjakobi commented 4 years ago

Do you have examples of generated code it has to work with?

AFAIK nearly all code in https://github.com/dhall-lang/dhall-kubernetes is generated.

neongreen commented 4 years ago

Regarding the apparent demand for a Vonderhaar style pretty-printer

Changed the issue title to make the demand more explicit 🙂

There are many small problems with dhall format that could be solved without rewriting it, but making it a Vonderhaar style pretty-printer solves a problem that is hard to describe without literally saying "I want a Vonderhaar style pretty-printer". Something along the lines of: I want similar pieces of code to be formatted consistently, except that I can't precisely define 'similar', and also sometimes I want to opt out anyway.

Gabriella439 commented 4 years ago

The concern about generated code also applies to normalized source code, too, since the normal form might not correspond one-to-one with the original source code.

neongreen commented 4 years ago

Looking at dhall-kubernetes, I think even a Vonderhaar style pretty-printer will work nicely there – they already generate well-formatted code, it just needs parentheses to be stripped.

I don't have a good answer for normalized code, though. Counting columns might indeed be the best approach there.

dhall-lang / dhall-haskell

Rewrite 'dhall format' to be a Vonderhaar style pretty-printer #1496