Open neongreen opened 4 years ago
So I don't know about lists, but it seems like we could easily have the formatter always prefer multi-line strings if the string has at least one newline in it
I suspect this would be equally troublesome, just in the other direction. Don't have a good example though.
we could easily have the formatter always prefer multi-line strings if the string has at least one newline in it
How about at least one newline and at least one non-newline character? That way "\n"
and "\n\n"
are still formatted as a single-line strings.
Here is the relevant code BTW:
NB:
even then that's not the way that the prettyprinter library works (that dhall-format is based on). prettyprinter just tries to format the expression in single-line form regardless of what the user originally tried to do and if the single-line form exceeds 80 columns then it uses multi-line form.
I would go as far as claim that getting rid of prettyprinter
is a good thing.
The current behavior re/ line-wrapping is a bit too much of "spooky action at a distance". Here is a demonstration. Take a snippet that barely fits into 80 chars:
λ(page : Page/Type)
→ node "html"
⫽ attrs (toMap { lang = "en" })
⫽ content
[ node "head"
⫽ content
[ node "meta" ⫽ attrs (toMap { charset = "UTF-8" })
, Html.nodes
( List/map
Text
Html.Node
( λ(link : Text)
→ stylesheet
"non-breakable line barely fitting into 80 characters"
)
page.extraStyles
)
, Html.nodes page.extraHead
]
, node "body"
]
Adding one more character to the non-fitting line causes many other lines to be rewrapped for no good reason:
λ(page : Page/Type)
→ node
"html"
⫽ attrs (toMap { lang = "en" })
⫽ content
[ node
"head"
⫽ content
[ node
"meta"
⫽ attrs (toMap { charset = "UTF-8" })
, Html.nodes
( List/map
Text
Html.Node
( λ ( link
: Text
)
→ stylesheet
"non-breakable line barely fitting into 80 characters!"
)
page.extraStyles
)
, Html.nodes page.extraHead
]
, node "body"
]
I, for one, would (probably!) not miss line-wrapping if it was gone, and getting rid of it would mean (I think?) that we can get by without prettyprinter
.
@neongreen The extra line rewrapping is due to the same bug you discovered in #1400. With my patch from https://github.com/quchen/prettyprinter/pull/89, your example is formatted like this:
λ(page : Page/Type)
→ node "html"
⫽ attrs (toMap { lang = "en" })
⫽ content
[ node "head"
⫽ content
[ node "meta" ⫽ attrs (toMap { charset = "UTF-8" })
, Html.nodes
( List/map
Text
Html.Node
( λ(link : Text)
→ stylesheet
"non-breakable line barely fitting into 80 characters!"
)
page.extraStyles
)
, Html.nodes page.extraHead
]
, node "body"
]
@sjakobi oh! This is good to know.
I'll throw another vote in for dropping Wadler-style in exchange for Vonderhaar-style.
Vonderhaar-style is where a construct that is written over a single line stays a single line and a construct that is written over multiple lines stays multiple lines. It was popularized by the creator of elm-format
, influenced purty
, and now ormolu
. It's a nice way to format computer languages nowadays.
I suspect this would be equally troublesome, just in the other direction. Don't have a good example though.
I agree 100%. Dealing with dhall format
and Text
literals is incredibly annoying. Changing the algorithm to prefer one way of reformatting Text
literals over the other won't fix the problem.
I'm a huge fan of the Vonderhaar style, thanks for teaching me the name of it!
I want to add another reason to possibly get rid of prettyprinter
: it's currently becoming a performance bottleneck now that we've made a lot of progress optimizing on other fronts
I want to add another reason to possibly get rid of
prettyprinter
: it's currently becoming a performance bottleneck now that we've made a lot of progress optimizing on other fronts
Oh! I thought I had seen it take quite a bit of time formatting the normalized pkg-set.dhall
from cpkg
but I thought that that was a rather extreme case and an unusual thing to do. Do you have "real-world" expressions that take too long to format?
@sjakobi: Here's an example:
If you add a trivial type error to that module and try to type-check it with dhall-1.27.0
it spends quite a lot of time just rendering the (truncated) context due to pretty-printing being the bottleneck. This is what motivated the following change:
Wow! That's quite bad indeed. Incidentally quchen/prettyprinter#89 would speed this up a bit, maybe 15-20%.
What really impressed me was how long the first type-checking took. I didn't time it, but I'm pretty sure it was more than 10 minutes. I suspect that most of that time was spent on import resolution, possibly downloading one file of dhall-kubernetes
after another?! Has there been a discussion of possible speedups there before?
@sjakobi: It depends on the specific bottleneck but if the root cause is indeed downloading too many files then one solution would be to create a globally available caching forward proxy that can pre-resolve and pre-encode imports (plus language support for importing from a binary content type). Then users could add http_proxy=https://proxy.dhall-lang.org
as an environment variable to take advantage of this shared proxy
@Gabriel439 I've made a new issue for this: https://github.com/dhall-lang/dhall-haskell/issues/1511.
Another example, illustrating the problem better. The formatter makes this snippet look much messier than it could be:
⫽ content
[ Html.node "li"
⫽ content
[ text "Language questions ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content
[ text "Ecosystem questions ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content [ text "Tooling support ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content
[ text "Code review ", rightSideQuantity "20 hours" ]
, Html.node "li"
⫽ content
[ node "strong" ⫽ content [ text "+ Architecture review" ]
]
, Html.node "li"
⫽ content
[ node "strong" ⫽ content [ text "+ Performance review" ]
]
, Html.node "li"
⫽ content
[ node "strong" ⫽ content [ text "+ Security review" ] ]
]
Original:
⫽ content
[ Html.node "li"
⫽ content [ text "Language questions ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content [ text "Ecosystem questions ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content [ text "Tooling support ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content [ text "Code review ", rightSideQuantity "20 hours" ]
, Html.node "li"
⫽ content [ node "strong" ⫽ content [ text "+ Architecture review" ] ]
, Html.node "li"
⫽ content [ node "strong" ⫽ content [ text "+ Performance review" ] ]
, Html.node "li"
⫽ content [ node "strong" ⫽ content [ text "+ Security review" ] ]
]
When working with moderately nested structures, formatting consistency takes a huge hit as it approaches the 80-column boundary.
@neongreen That because some lines in the original are wider than 80 characters, right?!
Here's a reproducible version:
[ [ [ [ [ [ x
⫽ content
[ Html.node "li"
⫽ content
[ text "Language questions ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content
[ text "Ecosystem questions ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content [ text "Tooling support ", rightSideQuantity "∞" ]
, Html.node "li"
⫽ content
[ text "Code review ", rightSideQuantity "20 hours" ]
, Html.node "li"
⫽ content
[ node "strong" ⫽ content [ text "+ Architecture review" ]
]
, Html.node "li"
⫽ content
[ node "strong" ⫽ content [ text "+ Performance review" ]
]
, Html.node "li"
⫽ content
[ node "strong" ⫽ content [ text "+ Security review" ] ]
]
]
]
]
]
]
]
We could make the page width configurable if that would be useful for you. So you could use dhall format --page-width 100
to format with a page-width of 100 columns instead of 80.
BTW is that HTML DSL open source? Looks neat! :)
100 chars will make this example better at the expense of making other things worse (stuff that I want to stay multi-line will become single-line).
My point is that there's no --page-width
I could choose that wouldn't lead to inconsistent formatting.
The HTML DSL is here but it needs a renderer: https://gist.github.com/neongreen/9b0845789c62301e0cc978019556cbb2. I want to release it at some point but currently don't have time for it.
(Node
could be a recursive type but I cheated and used JSON instead because type checking was slow.)
@neongreen Regarding rendering: In https://github.com/dhall-lang/dhall-lang/issues/200 there's some discussion toward a generic pretty-printing library that could be useful for your renderer. Not sure how usable @matheus23's dhall-blocks
is today…
(
Node
could be a recursive type but I cheated and used JSON instead because type checking was slow.)
I'm very interested in improving dhall
's performance – so please do report such issues if you can! :)
Regarding the apparent demand for a Vonderhaar style pretty-printer: I agree that it could do a better job at formatting source code than the current pretty-printer. But I suspect that it might be difficult to get it to format generated code, such as the output of dhall type
, well. That means we might end up with two pretty-printers, and I worry that that would take too much effort to maintain.
I should clarify: Building a Vonderhaar-style formatter for Dhall might still be a worthwhile effort. But if it should replace the existing pretty-printer, it needs to work well for generated code too. So maybe it makes sense to develop it as a separate tool.
Good objection.
Do you have examples of generated code it has to work with?
Something as simple as a flag to "always use multi-line formatting for records and strings with newlines" might do the trick, but I don't know.
Do you have examples of generated code it has to work with?
AFAIK nearly all code in https://github.com/dhall-lang/dhall-kubernetes is generated.
Regarding the apparent demand for a Vonderhaar style pretty-printer
Changed the issue title to make the demand more explicit 🙂
There are many small problems with dhall format
that could be solved without rewriting it, but making it a Vonderhaar style pretty-printer solves a problem that is hard to describe without literally saying "I want a Vonderhaar style pretty-printer". Something along the lines of: I want similar pieces of code to be formatted consistently, except that I can't precisely define 'similar', and also sometimes I want to opt out anyway.
The concern about generated code also applies to normalized source code, too, since the normal form might not correspond one-to-one with the original source code.
Looking at dhall-kubernetes, I think even a Vonderhaar style pretty-printer will work nicely there – they already generate well-formatted code, it just needs parentheses to be stripped.
I don't have a good answer for normalized code, though. Counting columns might indeed be the best approach there.
This has already been discussed a bit at https://github.com/dhall-lang/dhall-haskell/issues/244, but I want to bring this up again.
I'd like it if
dhall format
preserved my intent to lay out an expression one a single line / on several lines.I am making a small HTML templating system with Dhall – it's supposed to power monadfix.com, which is currently done with Mustache. The syntax is quite a bit more verbose than Mustache, but I can live with it if I get Dhall's benefits in return.
Both this time – and pretty much every time I tried Dhall before that – I kept wishing that
dhall format
would not transform multiline strings/lists into single-line strings/lists depending on how short/long they are.Dhall-ised HTML is already putting a strain on me, so I want trees to at least stay trees when I write them as trees. Same with multiline strings – I growl every time I write a multiline string and it turns into something like
\n\nLorem ipsum\n\n
because it's not long enough.