chris-martin / bricks

Bricks is a lazy functional language based on Nix.
https://hackage.haskell.org/package/bricks
6 stars 1 forks source link

Prose strings #19

Open chris-martin opened 6 years ago

chris-martin commented 6 years ago

This is a language feature I don't believe I've ever seen or hear of before, but I really want it: A kind of string literal specifically designed for paragraphs of prose.

The rules are:

This means that for strings containing content such as paragraphs of markdown, the formatter would be free to wrap lines and normalize whitespace.

chris-martin commented 6 years ago

At this point, rather than building more kinds of strings into the core language, it might be a better idea to include some kind of extensible system of string semantics. I have been thinking about something along those lines anyway because I think customizable delimiters/escapes would be useful - since one of the purposes of this language is to be able to easily embed any kind of string content without worrying escaping much.

chris-martin commented 6 years ago

I believe I have a simpler compromise: If an ''-quoted string has content that starts on the same line as the opening quote, then treat it as prose - otherwise treat it as code.

I think the result is aesthetically pleasing:

    [
      (h2 ''Launching the REPL from the scala-compiler jar'')

      (p ''Normally you’d launch the REPL using the `scala` script provided by 
        the standard Scala installation. But I’d prefer to let Gradle download 
        the appropriate version of Scala for the project rather than requiring
        developers to install it themselves. Gradle can help us with this 
        because the artifact `org.scala-lang:scala-compiler` in the Maven 
        Central repo contains the Scala REPL.'')

      (p ''The `main` method that launches the REPL belongs to a class with the
        (rather non-obvious) name `scala.tools.nsc.MainGenericRunner`. Thus we 
        need to run'')

      (bash ''
        java -Dscala.usejavacp=true \
             -classpath "$CLASSPATH" \
             scala.tools.nsc.MainGenericRunner
      '')

      (p ''where `$CLASSPATH` includes the `scala-compiler` jar.'')
    ]
chris-martin commented 6 years ago

We can do the same thing with block comments to autowrap those as well.

chris-martin commented 6 years ago

So now I'm looking at a complete overhaul of string syntax.

Normie strings

I still want to keep quotes like "hello" and "line one\nline two", because they are familiar.

In addition to the few standard escape sequences these strings already support, I also want to add escapes for typing any unicode character - following Haskell's example seems sensible. For example, "\955" is interpreted as "λ".

Haskell also has an empty escape sequence \& so that, for example, "\955\&5" can be interpreted as "λ5" and not "\9555". I don't think we need that, because it would be more clear to antiquote the character ("${"\955"}5") instead. It might also be reasonable to permit ${} as a no-op within strings.

I do not want normie strings to be able to contain line breaks, as this strikes me as quite poor style.

Normie strings will continue to support antiquotation.

Prose strings

Prose strings begin with two or more ' characters, and end with the same number of ' characters. The idea is that you use as many as you need to avoid conflicting with the content of the string.

Examples:

Note how in the last example we had to put a space between the '' string content and the ''' delimiter. This does not affect the string value, since this kind of string has its whitespace trimmed anyway.

We could relax the syntax to allow quoting with only a single ' character, but given the precedents for the meanings of " and ' in other languages (I'm thinking of Bash, Python, and Haskell), I think this would be misleading. I think the rule of adding more apostrophes is easier to explain if you start from two.

This string form will also support antiquotation, but no escape sequences whatsoever. All backslashes are interpreted as literal backslashes. If you want to express the literal ${ in a prose string, you have to do it within antiquotation:

In the interests of avoiding backslash escapes in the inner normie string, we may express the antiquoted value as as box string instead, which we will discuss in the next section.

Box strings

Box strings are for code. They are designed to be as unobtrusive as possible, which still permitting antiquotation. In particular we want to avoid escape sequences, because the code itself is likely to also contain escape sequences, and multiply-escaped code quickly becomes impossible to decipher without careful reading and a detailed understanding of all the languages involved.

This goal presents somewhat of challenge, because any "uncommon" characters we might pick to use for antiquotation will be used by some programming language. We can simplify the problem statement by stating this goal: We would like to design a string literal with which we can even express Bricks code without escapes, and thus any special characters that we choose for our syntax will appear in our strings.

Furthermore, like indented strings in Nix, we need to be able to indent our box-string expressions based on the surrounding context without affecting the indentation of the box-string's contents.

Finally, we aspire to a clean aesthetic. In particular we wish to do better than styles such as Haskell string gaps which require each line in a multi-line string to have some terminating character.

A starting character on each line, however, we find to be an acceptable visual improvement rather than a clutter. It seems particularly clear if we use box-drawing characters to fence off our code blocks:

  ┌───
  │filter :: (a -> Bool) -> [a] -> [a]
  │filter _pred []    = []
  │filter pred (x:xs)
  │  | pred x         = x : filter pred xs
  │  | otherwise      = filter pred xs
  └───

This doesn't look quite as nice on GitHub, but in both IntelliJ and Atom the vertical lines connect to create the appearance of a single line.

For ease of typing, we will also permit an ascii variation of this syntax with a similar appearance.

  +----
  |filter :: (a -> Bool) -> [a] -> [a]
  |filter _pred []    = []
  |filter pred (x:xs)
  |  | pred x         = x : filter pred xs
  |  | otherwise      = filter pred xs
  +----

And we can recommend using the autoformatter to convert this to the fancier style.

Requiring a line-starting character removes the problem of needing to escape the closing string delimiter, and it also cleanly eliminates the indentation issue, as we can simply ignore any whitespace to the left of the line-starting character.

The only issue that remains is antiquotation. As I am unwilling to introduce any escape sequences into box strings, and also unwilling to settle on a particular set of antiquotation delimiters which would inevitably conflict with the contents of the string, the only option I see remaining is to support a number of delimiter styles and allow the Bricks user to select one that is appropriate given the contents.

If we were to allow an entirely unrestricted choice of antiquote delimiter, I'm concerned that it could become too difficult to express a context-free grammar for the language, and that odd choices would lead to particularly unusual-looking Bricks code that make a bad impression on new readers. So I think the best option would be to support a handful of options.

For example, these two strings would be equivalent:

  ┌─── <>
  │mkdir -p <cfg.outputDirectory>
  │chown <cfg.user>:<cfg.group> <cfg.outputDirectory> -R
  │rm -rf <cfg.cacheDirectory>/theme
  │mkdir -p <cfg.cacheDirectory>/theme
  │cp -R <cfg.outputTheme>/* <cfg.cacheDirectory>/theme
  │chown <cfg.user>:<cfg.group> <cfg.cacheDirectory> -R
  └───
  ┌─── ${}
  │mkdir -p ${cfg.outputDirectory}
  │chown ${cfg.user}:${cfg.group} ${cfg.outputDirectory} -R
  │rm -rf ${cfg.cacheDirectory}/theme
  │mkdir -p ${cfg.cacheDirectory}/theme
  │cp -R ${cfg.outputTheme}/* ${cfg.cacheDirectory}/theme
  │chown ${cfg.user}:${cfg.group} ${cfg.cacheDirectory} -R
  └───

I think it would be reasonable to support basically anything that is...

If you don't specify any antiquote delimiters, then you get no antiquotation at all, and thus pure verbatim strings.

chris-martin commented 6 years ago

I believe comments should follow the same principle as strings:

chris-martin commented 6 years ago

I'm feeling good about box strings, but I'm still uneasy about prose strings. The use case that's bothering me is LaTeX. I'm really going to want to write LaTeX in paragraphs, and it is of course full of \ characters that I won't want to escape as \\. Furthermore, a lot of my math-heavy writing is even full of ${ sequences. To summarize, LaTeX is very problematic for just about every aspect of Nix-style strings. It's even the reason we needed to introduce the slightly ugly '''-quoted variant of prose strings.

In my experimental Bricks-formatter blog posts, the only thing I've used paragraph antiquotation for is hyperlinks:

(p ''Slashdot recently featured ["A Mathematician’s Lament"](${link})
  by Paul Lockhart.'')

But it's an important use case and I'm not willing to give it up. The alternative doesn't seem great:

(p [
  "Slashdot recently featured "
  (a { href = link; } "A Mathematician’s Lament")
  " by Paul Lockhart.''
])

On one hand, less string concatenation means less potential for mistakes (like if link contains a character that would screw up the parsing of the markdown link). But we lose the paragraph flow, and it's quite easy to accidentally omit the whitespace in either string that abuts the link.

Now, in some use cases I have in mind, I don't actually want to be writing LaTeX directly in Bricks code at all, but rather using functions that can generate LaTeX or HTML or whatever.

{ p, emph, quote }:
(p ''${emph "This"} is our ${quote "concern"}, Dude.'')

That possibility, combined with the option to write LaTeX in box strings if you want (which is more suitable for things like large math functions anyway), is a source of some comfort.

chris-martin commented 6 years ago

Oh, on further thought, I remembered that prose strings don't have backslash escapes, so that's not a problem. The only real issue with LaTeX paragraphs, then, is when you want to type ${ -- which probably is a problem in other cases too, like if you want to type a bit of inline Bash in your paragraph.

Given that \( ... \) is actually the preferred way to write LaTeX math, I don't even think ${ ought to actually be a common LaTeX case.

But, I'm still interested in future-proofing. The goal of Bricks syntax is generality - I want to make as few assumptions as possible about the string contents.


Just to try out a crazy idea, what if we allowed you to specify the antiquote delimiters for prose strings, in a way similar to how we do it for box strings?

This looks weird, but just as a jumping-off point:

(p |${}| ''Slashdot recently featured ["A Mathematician’s Lament"](${link})
  by Paul Lockhart.'')
(p |<>| ''Slashdot recently featured ["A Mathematician’s Lament"](<link>)
  by Paul Lockhart.'')

I like the idea of antiquotation here being "opt-in" so that it's harder to get surprised by it. An unqualified '' string would have no source of interference whatsoever except for the closing delimiter (which is subtly customizable via the variable number of apostrophes).

chris-martin commented 6 years ago

I slept on it and settled on an answer:

Prose quotes content has to start on a new line after the opening delimiter. This gives us a good place to specify antiquote delimiters, the same as where it goes for box quotes.

(p '' ${}
  Slashdot recently featured ["A Mathematician’s Lament"](${link})
  by Paul Lockhart.
'')
(p '' <>
  Slashdot recently featured ["A Mathematician’s Lament"](<link>)
  by Paul Lockhart.
'')

I'm not finding that I miss the same-line terseness much.

chris-martin commented 6 years ago

There is an issue with box quotes - If a box quote has a multiline antiquote, does it look like this:

  ┌─── ${}
  │mkdir -p ${
  │  cfg.outputDirectory
  │}
  └───

Or like this:

  ┌─── ${}
  │mkdir -p ${
     cfg.outputDirectory
   }
  └───

?

The latter option is ugly, but the first I suspect may not be context-free.

I'll go with the ugly option and an assumption that it doesn't matter much, because I don't think it's a good idea to put multi-line expressions in antiquotes. (And I don't think the renderer puts newlines in antiquotes now - at least not often.)