JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.61k stars 5.48k forks source link

julep: a plan for backticks #12139

Closed StefanKarpinski closed 7 years ago

StefanKarpinski commented 9 years ago

It has not infrequently seemed a shame to me that Markdown has trained us all to quote code by wrapping it in backticks. a + b is how we want to write the quoted expression a + b. That's considerably nicer than :(a + b) – the frownieface operator is just kind of weird – and it has some syntactic issues since the parens are actually part of the expression being quoted, not the quotation syntax; this has tripped quite a few people up.

Currently, backticks are used for quoting external commands using a convenient shell-like syntax. You don't want to use single or double quotes for this since it's quite common to want to use those quote characters in command expressions. But there's one bit of syntax we haven't exploited yet: backtick custom-literal strings. (This option only just occurred to me the other week.) So I would propose the following syntaxes:

  1. Use bare backticks to quasiquote Julia code: a + b + $ex. The dollar sign splices expressions into the quoted code as it does inside of :(...) currently.
  2. Use cmd-prefixed backticks to write commands: pipe(cmdfind -name *.$ext, cmdhead -n$n). The dollar sign splices values into commands as it does into backticks currently.
  3. Use colon for symbol literals, allowing double quotes to write symbols that aren't valid identifiers: e.g. :foo for symbol("foo"), :"foo bar" for symbol("foo bar") or :123 for symbol("123").

Using backticks for quasiquoting has the advantage that it's what lisp does. Getting to this point without breaking everything will require a substantial deprecation process:

That's a long process, but I think it's a better use of backticks. It has the advantages of matching how we write quoted code in Markdown and most Lisps use backtick for quasiquotation – in Lisp style just at the front, of course, but still, I think it will be more familiar to Lispers.

jakebolewski commented 9 years ago

-1 from me. Although I like the proposal, I don't think what we gain here is worth the massive breakage.

simonster commented 9 years ago

If we're going to change the command syntax, why not have cmd be an ordinary custom string literal? It's not clear to me what we'd gain from backtick custom literals besides confusion and extra special cases in the code. We can always use cmd"""x""" if there are quotes, but I'm not sure there are many cases where you can't use single quotes in the command. (In fact I'm not sure there are many cases where you actually want quotes at all if you have interpolation.)

Ref #9945 for the proposed symbol changes.

ScottPJones commented 9 years ago

I agree with @simonster, cmd"""xxx""" or cmd"xxx" instead of backticks for commands. I don't think backticks are used for commands that frequently, in base, I found 9 files that used it, plus 8 files in the pkg directory, and a lot more places where backticks were part of documentation. In packages, it seems that most places where backticks are used as commands, was as a literal argument to run, so I that for those cases, why can't run also accept a normal string, and treat it as if it were in backticks? Add the cmd"..." and cmd"""...""", along with run("...") and run("""..."""), deprecate backticks for commands at the same time, and then think about using backticks for other things.

tkelman commented 9 years ago

I'm not sure there are many cases where you can't use single quotes in the command.

The cmd shell on Windows doesn't handle single quotes the same way a posix shell does, as one case.

hayd commented 9 years ago

cmd".." is also easy to add to Compat, no need for parser changes.

-1 to run et al accepting plain strings due to the difference between interpolation (e.g. of arrays; see http://julialang.org/blog/2013/04/put-this-in-your-pipe/).

toivoh commented 9 years ago

+1 to this proposal, and the end goal. Beyond making quoting much more readable, it also introduces a distinction between quoted symbols and bare ones, which I seem to recall is needed to improve macro hygiene.

One thing that would be lost though: it would no longer be possible to nest quasiquotes lexically. But I think the needs of that would be infrequent enough that you could easily work around it.

ivarne commented 9 years ago

This seems like two independent issues.

  1. Enable backtick custom string literals, and how they should work.
  2. Reclaim unprefixed backtick literals for a more widely used purpose.

Previously we had @*_str and @*_mstr macros, but they were merged when the deindentation function for tripple quoted strings were moved to the parser. Should prefixed backtick quoting be just another string literal that calls @*_str, with different parser behavior with regard to escaping, or do we want a different concept?

2 will be a long process with deprecation periods to allow people to migrate to the new solution, so there is no hurry deciding what we will use the syntax for.

mauro3 commented 9 years ago

Would triple back-ticks replace quote-blocks?

ScottPJones commented 9 years ago

@hayd Pardon my ignorance, could you give an example of why run("string") could not be treated as the equivalent of run(string)? I read the link, but couldn't see just where that said or implied that that wouldn't work. Thanks.

@toivoh Example of nested quasiquotes please? Thanks.

MikeInnes commented 9 years ago

+1, this seems like a nice improvement. If nothing else I'm not going to be able to unsee the frownyface operator now.

@toivoh I don't think we'd necessarily lose that ability. We can already nest strings as e.g. "foo $("bar") baz", and the parser realises that " doesn't end the string because it's inside an expression. foo(`bar`) could work in the exact same way.

pao commented 9 years ago

@ScottPJones Command interpolation works differently. For instance, assume we had a program nargs which returns argc, and let arg = "one two". run("nargs $arg") would return 3, but run(nargs $arg) would return 2.

tbreloff commented 9 years ago

+1 for the @StefanKarpinski proposals 1 and 3 (backticks become expressions, colons are symbols), and also the @mauro3 suggestion to use triple back-ticks to replace quote/end blocks. Also agree that cmd"..." and cmd"""...""" are sufficient and don't require special back-tick notation. I was burned yesterday by the subtlety of the frownyface notation, and I think there should be a clear distinction between expressions and symbols. As @ScottPJones pointed out, there are very few current uses of back-ticks so I say just go ahead and break stuff.

ScottPJones commented 9 years ago

@pao Thanks. That's a rather subtle difference I would think. With the backtick quoting, what would one do if they want things split up like in the first example (with " quotes)?

StefanKarpinski commented 9 years ago

I quite frequently write perl one-liners that use both single and double quotes nested and would require very confusing escaping otherwise. Examples just from the Julia repo:

Using cmd"""...""" would work, but is more awkward and to me less clear. If we don't want to introduce a second kind of non-standard string literal, we could also just make foo...` an alternate syntax forfoo"..."where the quote delimiter is ``` instead of"`.

StefanKarpinski commented 9 years ago

@ScottPJones, please read http://julialang.org/blog/2012/03/shelling-out-sucks/ and http://julialang.org/blog/2013/04/put-this-in-your-pipe/ for more background on why Julia's backticks exist, work the way they do, and are important for calling external programs reliably. I went through it there in a great bit of detail with lots of examples. No point in rehashing that unnecessarily.

StefanKarpinski commented 9 years ago

@one-more-minute wrote:

@toivoh I don't think we'd necessarily lose that ability. We can already nest strings as e.g. "foo $("bar") baz", and the parser realises that " doesn't end the string because it's inside an expression. foo(`bar`) could work in the exact same way.

Good point. Since you can parenthesize expressions you could always write (...) for nested quasiquotation. I also like the idea of ... for quote end – again, it fits nicely with how triple backticks are used in Markdown.

ScottPJones commented 9 years ago

OK, I did read the second one, that @hayd mentioned, I'll read the other one. Thanks.

jakebolewski commented 9 years ago

It is not a very strong argument that just because the cmd syntax is not used often in Base, it can just be freely deprecated without too much impact. Of course it is not used in Base, you should make minimal assumptions about your environment if you want to be cross platform. Command syntax is used often in data processing pipelines. It is often faster to call unix functionality through shelling out than to use Julia code to munge your data as the unix utilities are currently much faster.

The real deprecation here is not with the cmd syntax but with expression quoting (which is used everywhere). I agree that the backtick is marginally nicer syntax, but is it _that_ much nicer to go through all this code churn? @tbreloff you say that the current syntax is subtle, could you give a concrete example?

At the current release rate, this proposal would have us adapting to deprecations and rewriting code for ~2 years. To go through and fix packages is a lot of effort for often little gain. I'm just raising the red flag that we should actually be gaining something tangible from this proposal (other than it is more aesthetically pleasing) before committing to it.

StefanKarpinski commented 9 years ago

@ivarne wrote:

Previously we had @*_str and @*_mstr macros, but they were merged when the deindentation function for tripple quoted strings were moved to the parser. Should prefixed backtick quoting be just another string literal that calls @*_str, with different parser behavior with regard to escaping, or do we want a different concept?

Ah, I see you beat me to this proposal.

StefanKarpinski commented 9 years ago

@jakebolewski, this is a good point, but I do think that aesthetics matter and this is something that will be in the language forever. I don't really want to live with the frowneyface operator forever, especially when there's this other much nicer syntax so tantalizingly close.

tbreloff commented 9 years ago

regarding subtlety... here's a few quick examples which are non-obvious with a quick glance (for non-expert users anyways):

julia> x = :(); typeof(x)
Expr

julia> x = :(x); typeof(x)
Symbol

julia> x = :(+); typeof(x)
Symbol

julia> x = :(+5); typeof(x)
Int64

julia> x = :(+(5)); typeof(x)
Expr

I feel like it would be much clearer to see:

``   # equivalent to :()
:x
:+
`+5`
`+(5)`

Aesthetics matter a ton. I want to be able to scan code in 1-2 seconds to understand what it's doing.. I don't want to spend my time looking for matching parens and reasoning about what something means in context. This is doubly valuable if I can add logic to my syntax highlighter that clearly identifies expressions in the code. I can't easily do that if symbols and expressions share syntax.

MikeInnes commented 9 years ago

:(x) == :x is particularly fiddly, because it means that you can't reliably return quasiquote syntax from macros, and instead have to have use :($(Expr(:quote, x))) everywhere (linking back to the nested-quasiquotes issue). Making a distinction between symbols and quoted expressions makes a ton of sense.

jakebolewski commented 9 years ago

@tbreloff, @one-more-minute wouldn't using explicit quote ... end blocks solve most of the points you raise (except +5 which is transformed in the parser).

MikeInnes commented 9 years ago

Possibly, although wrapping things in a redundant Expr(:block) isn't always convenient either.

tbreloff commented 9 years ago

@jakebolewski yes you can obviously get around these problems, but quote ... end adds it's own confusion and messiness.

Julia is still 0.4 (dev)... if there are good solutions to making the language easy to understand/read, we should do it.

jakebolewski commented 9 years ago

@tbreloff what is the confusion and messiness with quote ... end blocks? Block syntax is fundamental to Julia.

Users who are manipulating quoted expressions have entered "sufficiently advanced user territory". We don't even commit to having a stable Expr AST representation.

mbauman commented 9 years ago

I think there'd be a certain elegance to have always wrap its contained expression in an Expr(:quote, …) Expr, akin to how quote … end is always Expr(:block, …) (and that could become ). Then you'd no longer need to worry about potentially getting AST literals back, either.

(Edited, thanks @toivoh)

mbauman commented 9 years ago

Thinking about the commonalities between quasi-quotation and command line syntax, they're both some sort of executable string with syntax. Perhaps the custom fooliterals should be encouraged for writing DSLs or other interop like `sql`…. With that in mind, should there be any differences in the parsing or macro name between foo"…" and bar…``?

quinnj commented 9 years ago

+1 to eventually using as default quoting syntax. I also think that having the shell> mode lessens some of the impact here since that's, at least for me, the most common use of shelling out in Julia.

@mbauman brings up a good point. Maybe the convention going forward is foo".." string literals return objects, while foo...`` backtick literals actually call a method of some kind, i.e. execution.

toivoh commented 9 years ago

quote … end is not Expr(:quote, Expr(:block, …)), just Expr(:block, …), and I think it should stay that way. When you are working with an AST, sometimes you want to quote it, but more often not I would say (and it's easier to add the quotation than to remove it).

mbauman commented 9 years ago

Of course, sorry for the misinformation and thanks for the correction. I was thinking one level too deep (:(:(…)) and :(quote end)). I had initially wrote that it wouldn't be possible to always return Expr without bigger changes, but in playing around at the REPL I was quoting quotes and got excited.

mbauman commented 9 years ago

With that in mind, should there be any differences in the parsing or macro name between foo"…" and bar…``?

Maybe the backtick macros always get file and line number information in a second argument? (Cf. #9577 and #9579)

StefanKarpinski commented 9 years ago

@jakebolewski wrote:

Users who are manipulating quoted expressions have entered "sufficiently advanced user territory". We don't even commit to having a stable Expr AST representation.

That's no excuse. We will at some point have to commit to a stable AST representation. The fact that we have not at this point is merely an artifact of being pre-1.0 and AST manipulation being a relatively niche thing.

StefanKarpinski commented 9 years ago

@mbauman wrote:

Thinking about the commonalities between quasi-quotation and command line syntax, they're both some sort of executable string with syntax. Perhaps the custom fooliterals should be encouraged for writing DSLs or other interop like `sql`…. With that in mind, should there be any differences in the parsing or macro name between foo"…" and bar…``?

Yes, this is precisely what I had in mind: backticks become a general way of quoting code.

Maybe the backtick macros always get file and line number information in a second argument? (Cf. #9577 and #9579)

That's an excellent idea. I quite like it.

jakebolewski commented 9 years ago

AST manipulation being a relatively niche thing.

That was my point. I just don't see the argument that the quote ... end syntax having a redundant Expr(:block) is really that big a deal for users who manipulate quoted syntax.

StefanKarpinski commented 9 years ago

It's certainly usable as it is, but it's not really that great. The unification of backticks as how one quotes code in general is a very nice generalization. We use it for that currently, but with the wrong default – the default kind of code should be Julia code, not external commands.

tkelman commented 9 years ago

We can already nest strings as e.g. "foo $("bar") baz", and the parser realises that " doesn't end the string because it's inside an expression. foo(`bar`) could work in the exact same way.

How so? The only reason the former nesting quotes inside interpolation works is because the parser has special handling for string interpolation. Are you proposing special parser handling of ( inside backticks?

There may be a better use for backticks than what we have now (though shell mode really isn't a substitute when you're writing non-interactive scripts), but I'm with @jakebolewski here - I'm not sure Julia Expr quoting is that much of a better use for them, it's a lot of churn, and there are downsides with nesting and making Cmd objects not work as well.

mlubin commented 9 years ago

I like the aesthetics of the backtick for quoting expressions.

yurivish commented 9 years ago

+1 from me. The frownyface operator has caused me sadness (and hours of debugging) in the past.

StefanKarpinski commented 9 years ago

@tkelman, it would work pretty much the same – if the quoted expression is complete when you encounter a ` then it closes the quoted expression; otherwise you try parsing it as opening an inner quoted expression. That means thatfoo(bar)` would work sincefoo(` is incomplete.

mbauman commented 9 years ago

The key is that the parsing of julia quasi-quotes and custom backtick quotes will be necessarily different.

I don't think that just changing the spelling of Julia quotes from :(…) to is really that meaningful. Sure, it makes :… always return a symbol, which is nice, but $x will still sometimes return AST literals. And there will still be crazy edge cases in the disambiguation between Colon(), ranges, ternary ?: syntax, and symbols.

It's the standardization of interop code blocks that makes this worthwhile in my view. I really like the unification of quoted code and code blocks for cmd, `cxx`…, sql…``, etc. +1 from me.

MikeInnes commented 9 years ago

One could even write

```cxx
int foo() ...


as an equivalent to `cxx`...`` to really unify things with Markdown, although that's a different bikeshed.

> special parser handling of `(` inside backticks

You could call it special handling if you want, but it's really no different to the way that `:(` doesn't always stop at the first `)`, or `quote` doesn't stop at the first `end`.
tkelman commented 9 years ago

Well it's a pretty major change to propose applying the Julia parsing rules inside backticks, which we certainly don't do inside strings or cmd objects right now (other than inside interpolation). That seems like it could make backticks less useful than what's being proposed for other-format prefixed overloads.

StefanKarpinski commented 9 years ago

It would only apply to bare backticks which are specifically for quoting Julia code. This is exactly the same as how bare double quotes allow interpolation of expressions that use double quotes.

tkelman commented 9 years ago

Except that backtick "interpolation" would be automatic, silent, and default rather than set off by $(

StefanKarpinski commented 9 years ago

@tkelman, I don't really get your issue with this – it is exactly analogous to string interpolation.

tkelman commented 9 years ago

No, I still think that's an imprecise analogy - it works the way code inside an interpolation (or existing quoting) works, not the way interpolation inside strings works. Interpolation is its own parsed context within the string. You're proposing making backticks a parsed context, except when prefixed by a formatting macro? Seems maybe useful, but not a dramatic improvement. The funny lowering of custom string literals is already kind of hidden and confusing, now we're going to add another version of it?

tkelman commented 9 years ago

Or to take this another step, if we're going to do this, why not apply the exact same treatment to single quotes while we're at it. I'm sure there's a better use for them than chars, we can just use char'a' for that. (not sure if joking)

mschauer commented 9 years ago

That is quite nice. 1 + 1 is quasi-quoted (julia)-code, and cmdecho -e "\033[2J" is code (a command with args in the execvp sense) and bashecho -e "\E[2J" is some specific shell code etc

hayd commented 9 years ago

You're proposing making backticks a parsed context, except when prefixed by a formatting macro?

Isn't the point that you could define parsing rules on prefixed backticks? e.g. cmd/cxx/sql.

The conventions between parsing and executing are a bit unclear: sql/cxx execute on construction (IIUC), cmd/:( don't and need to be run/eval'd.