ceylon / ceylon-spec

DEPRECATED
Apache License 2.0
108 stars 34 forks source link

string interpolation and methods as infix operators #574

Closed gavinking closed 11 years ago

gavinking commented 11 years ago

This issue contains two proposals, one of which would be dependent upon the other. I have already implemented both proposals in a branch, to convince myself that they are viable. Both proposals have been entirely implemented in the typechecker without impact upon the backends.

Improved string interpolation syntax

It's fair to say that the current syntax for string interpolation has caused me lots of problems during development of the grammar. The current grammar includes a quite awful syntactic predicate that covers some special cases of interpolated expressions, but leaves those cases without authoring support in the IDE.

The following syntax would completely clean up the grammar for interpolated strings:

print("Hello, 'name', the time is 'time'.");

That is, interpolated expressions are quoted using single quotes inside the double quoted string. We would no longer have single-quoted strings, but it doesn't look like we really need them anymore anyway.

I have a funny feeling that some people are going to find this easier to read, and it certainly cleans up the worst wart in the grammar.

Methods as infix operators

This is a feature that a lot of languages have these days, that I'm not necessarily a huge fan of, but that in light of the direction our syntax has gone, seems like it now fits in. The idea is to let you call a method with a single parameter according to the following syntax:

x op y

Instead of:

x.op(y)

For example:

value x = $101010 xor $111000;
print("hello" collect (Character c) => c.integer);

Instead of:

value x = $101010.xor($111000);
print("hello".collect((Character c) => c.integer));

Supporting this would relieve a lot of the pressure for new operators, for example, bitwise operators and vector operators, and allow us to continue with the current set of built-in operators. It's also makes anonymous function arguments more readable, especially if the anonymous function has multiple statements.

Note that:

Dangers:

Thoughts?

gavinking commented 11 years ago

That is, interpolated expressions are quoted using single quotes inside the double quoted string.

Alternatively, and probably much better, they could be quoted by backticks, and we can reclaim single quotes for Character literals or some other purpose.

print("Hello, `name`, the time is `time`.");

That looks very good to me.

quintesse commented 11 years ago

The infix stuff I'm okay with, I don't feel too strongly about that either way. The interpolation I do have some qualms about, mostly because I think the choice of the single quote will make it necessary to escape it often because it's used so frequently. Single quotes are used in English text as well as an alternative to the double quotes in text (ie String s = "Tom said, 'really now?'";). I'd rather a symbol that much less used, and preferably something that makes it stand out visually as something that is not part of the text proper. By which I mean that I would prefer something like

print("Hello, {name}, the time is {time}.");

over

print("Hello, `name`, the time is `time`.");
quintesse commented 11 years ago

@gavinking I definitely think backticks are an improvement over single quotes.

I wonder though that when we get rid of single quotes for strings (to reclaim them for characters which I think is great) if we could introduce some kind of alternative syntax for multi-line text that does no interpolation and needs almost no escaping (useful for doc elements for example that often contain entire code examples)

gavinking commented 11 years ago

I think the choice of the single quote will make it necessary to escape it often because it's used so frequently.

I guess the proposal to use backticks fixes that issue, no?

quintesse commented 11 years ago

I guess the proposal to use backticks fixes that issue, no?

Yes, see my second comment, much better. Although I'd prefer something even more obvious, but I'll accept any simple majority decision ;)

gavinking commented 11 years ago

we could introduce some kind of alternative syntax for multi-line text that does no interpolation and needs almost no escaping

Well, you guys could use your triple quotes for that, I guess:

"""This doesn't have any interpolation or \ escapes"""
'\n'
'\\'

Or perhaps we could use single quotes for this, and use a prefix to distinguish Character literals.

'This doesn't have any interpolation or \ escapes'
$"\n"
$'\'
quintesse commented 11 years ago

I definitely wouldn't have any problem with triple-quotes, it won't be seen as often as the normal string (so it doesn't matter it's a bit more ugly) while providing something useful without wasting a valuable symbol.

gavinking commented 11 years ago

Oh, shit. We can't just use backticks for string interpolation 'cos that's what Markdown uses for monospace. :-(

gavinking commented 11 years ago

P.S. @quintesse your suggestion of { and } doesn't work because { and } are already tokens in the language and can validly appear inside interpolated expressions.

gavinking commented 11 years ago

So I spent a few hours experimenting with a bunch of possibilities, and really the only thing that well work, in light of the fact that we use backticks for monospace in Markdown, is my original idea of single quoted interpolated expressions. However, as compensation, we can still reclaim single quotes for character literals, if we choose to, freeing up backticks for some other or future use.

I'm inclined to agree with @quintesse that, if we do this, it would be nice to have a "literally literal" string format, with no interpolation and no escapes. Possibilities include:

"""Triple-doubles"""
'''Triple-singles'''
`Backticks`

However, I also think we could live without this. It's a pain to have to write \' but it's not the end of the world.

gavinking commented 11 years ago

P.S. This work can be seen in the infix branch.

quintesse commented 11 years ago

Still don\'t like it, it just doesn\'t look good when you\'re dealing with english text. I\'d rather go for something $'likeThis' then.

About the "literally literal" format I'd not use the backtick, really saving them for some future use. And if we do minor usability improvements like not needing to type doc in annotations I think this one is worth doing as well, it doesn't cost us anything and it will greatly help in those situations where having to escape ' , " and \ is a major pain in the backside (code examples in docs and regular expressions being two of them).

gavinking commented 11 years ago

@quintesse I considered, and even implemented this:

print("Hello, $'name', the time is $'time'.");

But honestly when you see it in the IDE with syntax highlighting it looks a bit strange. The $ looks like it's part of the string.

About the "literally literal" format I'd not use the backtick, really saving them for some future use.

Yeah, I think that might be wise.

I think this one is worth doing

What is worth doing? """Triple-doubles"""?

I mean, yeah, sure, it's totally trivial for me to implement that - will take like 5 mins, I think, unless I'm missing something. But is that really the syntax we want?

quintesse commented 11 years ago

in the IDE with syntax highlighting it looks a bit strange

Maybe, but I definitely prefer a minor (hopefully solvable at some point) IDE coloring problem than having to live the rest of Ceylon's existence with very annoying escapes.

But is that really the syntax we want?

Well although I don't mind the syntax too much, I don't think it will any beauty prices either, so I'm open to alternatives. Thing is that I think that a repeating or complex combination (like for example with multi-line comments /* */) is probably preferable so we don't have to worry about escaping anything (another reason not to use backticks). Why not just try it out and see how it feels? We can always change it to some other combination of symbols if someone comes up with something good before 1.0.

jdpatterson commented 11 years ago

The infix syntax looks great and was one of the features of Kotlin that stood out to me. It's easy to imagine using it as a newbie browsing the tutorials.

At a glance it's like having optional parenthesis on a method call which is similar to another conceptual problem I had when first reading the ceylon spec: whether to use thing.size() or thing.size ? I see that the core libraries don't really have a consistent way of using one or the other. Is there some way to also unify these alternatives?

Having to escape ' would be a massive inconvenience.

On 31/01/2013, at 1:31 PM, Gavin King notifications@github.com wrote:

@quintesse I considered, and even implemented this:

print("Hello, $'name', the time is $'time'."); But honestly when you see it in the IDE with syntax highlighting it looks a bit strange. The $ looks like it's part of the string.

About the "literally literal" format I'd not use the backtick, really saving them for some future use.

Yeah, I think that might be wise.

I think this one is worth doing

What is worth doing? """Triple-doubles"""?

I mean, yeah, sure, it's totally trivial for me to implement that - will take like 5 mins. But is that really the syntax we want?

— Reply to this email directly or view it on GitHub.

jdpatterson commented 11 years ago

On 31/01/2013, at 1:31 PM, Gavin King notifications@github.com wrote:

print("Hello, $'name', the time is $'time'.");

that is barely more concise than: print("Hello, "+name+", the time is "+time+".");

and a lot less familiar

RossTate commented 11 years ago

At a glance it's like having optional parenthesis on a method call which is similar to another conceptual problem I had when first reading the ceylon spec: whether to use thing.size() or thing.size ? I see that the core libraries don't really have a consistent way of using one or the other. Is there some way to also unify these alternatives?

Just noting that, as I said, the same thing always drove me crazy with C#.

RossTate commented 11 years ago

Optimizing for Markdown instead of English seems like a mistake. Using {} looks much nicer to me. I feel like there's got to be a way around the problem you mentioned.

gavinking commented 11 years ago

At a glance it's like having optional parenthesis on a method call which is similar to another conceptual problem I had when first reading the ceylon spec: whether to use thing.size() or thing.size?

It's a great question. A non-answering answer would be: make it an attribute if you would call it getSize() in Java.

A better answer is: make it an attribute if repeated invocations, with no intermediate mutation of the receiver, return values that are "indistinguishable". As to what constitutes "distinguishable", well, I want to be a bit flexible about that, but as a guide:

Of course, if the operation changes the observable state of the receiver by side-effect, it should never be an attribute.

Thus, it's 1.float, since calling float repeatedly always gives you the "same" value in the sense that the resulting values are indistinguishable. But it's iterator.next(), since next() returns a different value each time it is called. It's collection.size and collection.sequence, because size or sequence will repeatedly return the same value (i.e. equal values) until the collection is explicitly mutated. It's collection.indexed because the Iterable returned by the operation indexed is backed by the underlying collection, and therefore two return values can't be "distinguished".

I see that the core libraries don't really have a consistent way of using one or the other.

That's a very fair observation and something we need to improve on. A classic example is that Iterable should have iterator() not iterator. (Iterators are stateful and can therefore be distinguished!) That's like that for historical reasons, since at one stage we tried having immutable iterators. But I'm sure you can find me some other examples which don't follow the rule of thumb above, and we should try to catch them and fix them.

gavinking commented 11 years ago

print("Hello, $'name', the time is $'time'.");

That is barely more concise than:

print("Hello, "+name+", the time is "+time+".");

@jdpatterson Yes, I totally agree with you, and that's where I think I also diverge from @quintesse. If the string interpolation syntax is almost as noisy as the concatenation operator, then I would say just drop it. Honestly the $ draws your eye away from what is being interpolated and forces your brain to come to grips with the extremely unnatural combination $'.

One point you're perhaps missing, however: the Ceylon string concatenation operator does not do implicit type conversions to String. (I always found that very fragile in Java because + is also overloaded to also mean numeric addition.) So the syntax with concatenation would be:

print("Hello, "+name+", the time is "+time.string+".");

This makes the interpolation syntax significantly cleaner in cases where the interpolated expressions are not already strings:

print("Hello, 'name', the time is 'time'.");
gavinking commented 11 years ago

Optimizing for Markdown instead of English seems like a mistake.

I disagree. Monospace spans occur at least an order of magnitude more often in our Ceylon codebase than apostrophes do.

I mean, FWIW, I even actually avoid using apostrophes in API documentation and error messages since for my taste it's too familiar-sounding in that context.

Using {} looks much nicer to me. I feel like there's got to be a way around the problem you mentioned.

Im sure it's technically possible—by calling our lexer recursively on its own tokens—but honestly I think it's absurd that you would not be able to tokenize the language in a single pass.

But anyway, having to escape { and } would also be an incredible pain since they occur so commonly in Ceylon code, which we very often embed in doc comments. So it would not be {...} it would be #{...} or ${...} or %{...} which, while perhaps easier on the eyes than $'...' are extremely noisy and shell-scripty.

gavinking commented 11 years ago

I mean, to be fair, I don't find the following objectionable from the point of view of readability:

print("Hello, %{name}, the time is %{time}.");

I just think it's crazy to not be able to tokenize your language in a single pass of the source text.

tombentley commented 11 years ago

What about using # as a delimiter on its own:

print("Hello, #name#, the time is #time#.");
gavinking commented 11 years ago

What about using # as a delimiter on its own:

Totally technically possible. Same for $.

print("Hello, $name$, the time is $time$.");

But I fucking hate it. Can't offer you any real rational reason why :)

gavinking commented 11 years ago

Actually, on second thoughts we can't use # or $ on their own because they currently delimit the start of binary and hex integer literals.

tombentley commented 11 years ago

OK, what about:

print("Hello, [[name]], the time is [[time]].");

This is at least using something which is normally a delimiter.

gavinking commented 11 years ago

@tombentley Again, [ and ] are tokens in the language and the combination [[ and ]] can naturally occur in the language, for example:

[[Float*]*]
x[n[i]]

Therefore the lexer can't assume that ]] is the beginning of a string literal fargment. It's the same problem we would have with { and }.

Now, OTOH, one thing we could do would be to fuck with the syntax of Markdown and use [My] or [[My]] for monospace, freeing up backticks for string interpolation. But I'm loathe to do that, since then it's just not Markdown anymore.

thradec commented 11 years ago

print("Hello, [[name]], the time is [[time]].");

There is again collision with markdown, because they are used for links.

gavinking commented 11 years ago

A possibility is to triple-quote interpolated strings:

print("""Hello, "name", the time is "time".""");

A reasonable case can be made for this option, I suppose.

gavinking commented 11 years ago

There is again collision with markdown, because they are used for links.

@thradec

Here's a sentence with [brackets] and [[double brackets]] in it.

Renders as:

Here's a sentence with [brackets] and [[double brackets]] in it.

For it to be a link it has to be followed by a ( or [.

jdpatterson commented 11 years ago

On 31/01/2013, at 10:42 PM, Gavin King wrote:

I just think it's crazy to not be able to tokenize your language in a single pass of the source text.

Is that what these guys do? If so, does it really impact parse performance so much?

http://confluence.jetbrains.com/display/Kotlin/Strings#Strings-Templates

This to me looks perfect:

val i = 10 val s = "i = $i" // evaluates to "i = 10" or an arbitrary expression in curly braces:

val s = "abc" val str = "$s.length is ${s.length}" // evaluates to "abc.length is 3"

quintesse commented 11 years ago

A possibility is to triple-quote interpolated strings:

I'd say do it exactly the opposite way. First of all because I'm thinking we're letting ourselves get carried away with this markdown business, a normal string in Ceylon will far more likely carry non-markdown text than anything else. So when do we use markdown? In documentation. Documentation that is either very long (multiple lines) or at least will span the entire line (a single doc entry) which would make it much more suitable for the triple-quote because you'll almost never mix markdown with interpolation.

So, triple doublequotes, no interpolation, perfect for markdown in docs. Single doublequotes, interpolation, suitable for everything else

gavinking commented 11 years ago

Is that what these guys do?

I have no idea what they do. They may use a handcoded lexer, in which case I suppose you can make it stateful and keep a count of the braces. But ANTLR uses a regex-based tokenizer, which can't handle such constructs.

If so, does it really impact parse performance so much?

It's not primarily performance I'm concerned about. It's what happens when the input is not well-formed. For example, when the interpolated expression contains an unmatched { or }, it disrupts the parsing of the whole string and the whole surrounding file. This is a real issue for us, since we use so many braces in our expression language.

gavinking commented 11 years ago

First of all because I'm thinking we're letting ourselves get carried away with this markdown business, a normal string in Ceylon will far more likely carry non-markdown text than anything else.

This is demonstrably false, at least as far as our own code goes. You can check our codebase to verify that.

So, triple doublequotes, no interpolation, perfect for markdown in docs. Single doublequotes, interpolation, suitable for everything else

I don't have a major objection to this.

thradec commented 11 years ago

For it to be a link it has to be followed by a ( or [.

@gavinking I didn't mean basic markdown syntax, but our extension "wiki style links syntax" (which allowing links to declarations), for example doc "Link to [[Float]]" (see #408)

gavinking commented 11 years ago

@thradec OK, but surely we would be able to change our Markdown extensions if we wanted to go down that path, which I frankly don't think we do.

gavinking commented 11 years ago

I'm playing with the stuff I implemented last night in the IDE. After a little adjustment to the autoeditstrategy it honestly works really nicely. I don't see much of a downside. It does feel more robust that what we have today. And with syntax highlighting it reads nicely. But I am thinking of specially-coloring the single-quotes that delimit the interpolated expression, just to make it even easier on the eyes for people like Tako.

quintesse commented 11 years ago

This is demonstrably false, at least as far as our own code goes. You can check our codebase to verify that.

I don't need to check I know you're wrong, because of my definition of "normal string" ;) For me a "normal string" is not the one passed to the doc element which is the only place we use markdown, I'm pretty sure no other string in the code base has markdown in it.

Of course I do agree that in proper code there will be a lot of doc elements so we should have something that works nice, that's why I suggested to use triple doublequotes for that.

to make it even easier on the eyes for people like Tako

I don't think I ever said anything about syntax coloring, I only care about not having to escape single quotes, if we can have that you can put the most ugly coloring on it you want ;)

gavinking commented 11 years ago

I am thinking of specially-coloring the single-quotes that delimit the interpolated expression, just to make it even easier on the eyes for people like Tako.

Done. It's an improvement.

gavinking commented 11 years ago

you can put the most ugly coloring on it you want ;)

It's configurable, of course. So you can make the single quotes bright red if you like, and always know that something is getting interpolated.

gavinking commented 11 years ago

Of course I do agree that in proper code there will be a lot of doc elements so we should have something that works nice, that's why I suggested to use triple doublequotes for that.

I think it's a very reasonable approach. Especially now that the doc identifier is optional, typing the """ is a small price to pay for not having to escape ", ', and \ in your documentation. And AFAICT, given the current grammar, three double quotes can never legally occur in succession.

quintesse commented 11 years ago

What syntax exactly did you decide upon in the end? Because after so many discussion I've lost track of what you actually implemented.

gavinking commented 11 years ago

What syntax exactly did you decide upon in the end?

print("Hello, 'name', the time is 'time'.");

As initially proposed, right at the top. At the very least it seems an improvement over what we have today. Furthermore, I changed the syntax for Character literals to 'X', just like in C and Java. And I've freed up backticks for some unknown future use.

I can add support for triple-quoted strings in 5 minutes later this afternoon.

quintesse commented 11 years ago

Aw man, come on, that syntax sucks when you have to deal with single quotes in text!

gavinking commented 11 years ago

Aw man, come on, that syntax sucks when you have to deal with single quotes in text!

WTF, isn't that what we're adding triple-quoted strings for?

quintesse commented 11 years ago

No, I said:

So, triple doublequotes, no interpolation, perfect for markdown in docs. Single doublequotes, interpolation, suitable for everything else

But of course I didn't specify the interpolation symbols to use, but I'd go for backticks, because even though it is what is used for markdown you'd normally never mix the two (no use writing markdown in a single doublequoted string with interpolation). It might still confuse some people but I prefer that over people raging that they have to escape every single quote in a text just because we thought it was nice to have markdown in doc elements.

gavinking commented 11 years ago

I'd go for backticks, because even though it is what is used for markdown you'd normally never mix the two (no use writing markdown in a single doublequoted string with interpolation).

OK, that's also a reasonable option. We would be essentially requiring people to almost always triple-quote their doc annotations but again that doesn't seem unreasonable.

However, note that if we go down that path we will not be able to use backticks for anything else in future. We would have already "used up" single, double, triple-double and backtick quotes.

Well, OTOH, we could go back to quoting character literals with backticks, and free up the single quote for some other future use. But I must admit I was kinda enjoying writing characters in single quotes again.

quintesse commented 11 years ago

Oh ok, I didn't know that using backticks inside strings would prevent their use in the rest of the code as well, but I'd still prefer that over escaping and I also prefer the normal single quotes for characters, so I guess that means we can't use backticks for future stuff, but to me that's okay. Although I'd like to know what others think of all this ;)

FroMage commented 11 years ago

So what's the latest proposal? Can I start saying I object to it now?

quintesse commented 11 years ago

No, you can't object you can only come with solutions :)