brick-lang / brick

The Brick language spec
University of Illinois/NCSA Open Source License
31 stars 0 forks source link

Multiline string literals #8

Closed weswigham closed 10 years ago

weswigham commented 10 years ago

Does the language even have them? Preferably without the 'continuation' backslash?

toroidal-code commented 10 years ago

I don't know. It's something I've been putting off. Ruby inserts newlines into strings, and uses here-docs for newline-escaped strings. Python does continuation, which is sane, but ugly. Common Lisp requires the '\n' to do a new line in any string, which is what I'm leaning towards.

Either way, we're either going to need to indicate continuation, or indicate newline.

let! s = "This is a multiline string \n
          we are trying to create sane string syntax. \n
          What do you think of this?"
toroidal-code commented 10 years ago

For reference

weswigham commented 10 years ago

Or an alternate syntax for defining multiline strings; after going through and trying to code some Brick (I have a 200-line test framework), something along the lines of

let! y = | "The first line of a multiline string"
         | "The second line of a multiline string"
         | "The third line of a multiline string"

Which, in a multilet looks like

let | x = "A normal string"
    | y = | "The first line of a multiline string"
          | "The second line of a multiline string"
          | "The third line of a multiline string"
    | z = "Another normal string"

It looks especially good when describing a function (ie, no let at all)

fn description
    | "This is the first line of the description this returns"
    | "And this is the second"
toroidal-code commented 10 years ago

I kinda like this. I'm going to want @zellio in on this decision. It looks like it might be a pain in the ass to do in the parser, but not impossible

For the last example, you should be using the doc comment. See #7

weswigham commented 10 years ago

NO no, I mean in that last example, function description RETURNS that string. (Since it's the last value in it.)

toroidal-code commented 10 years ago

Ahhhhh. I see. alright, that's okay then. I find the pipe on the first line of the string a little confusing.

toroidal-code commented 10 years ago

I know it's a bit against the rest of the language, but what if we removed the first string's leading pipe? it's not being 'joined' with anything, so I feel like it's unnecessary. It'll make parsing strings easier, since we don't need to have two non-terminals for strings in the grammar. (pipe-string and just string)

string : STRING join_string* ;
join_string : PIPE STRING ;

vs

string : STRING join_string* 
       | join_string* ;
join_string : PIPE STRING ;

So then we would have:

fn description
    "This is the first line of the description this returns"
    | "And this is the second"
toroidal-code commented 10 years ago

The problem is there's no keyword on the left of the pipe, which is what makes all the other piped forms so distinguishable

weswigham commented 10 years ago

Lack of a keyword is a keyword, right (you would have just tossed an error before)? | is simply a context-sensitive pseudo-operator.

toroidal-code commented 10 years ago

We could go the c++ route, and have adjacent string literals just composed by the compiler into a single string, which I have no problem with.

weswigham commented 10 years ago

...does this mean I'd have to explicitly write out my newline character? I'd prefer to not have to.

toroidal-code commented 10 years ago

I'd like to avoid that too. So lets do this:

Would this work well?

weswigham commented 10 years ago

So it would simply be

let! y = "The first line of a multiline string"
         "The second line of a multiline string"
         "The third line of a multiline string"

let | x = "A normal string"
    | y = "The first line of a multiline string"
          "The second line of a multiline string"
          "The third line of a multiline string"
    | z = "Another normal string"

fn description
    "This is the first line of the description this returns"
    "And this is the second"
toroidal-code commented 10 years ago

Yeah, that looks right.

weswigham commented 10 years ago

I'm okay with this.

toroidal-code commented 10 years ago

Alright, let's document and close this then. We may need to revisit the issue later if we find trouble adapting the early stages of the compiler to this, but it doesn't look like it will be too bad.

weswigham commented 10 years ago

Though... in that last case I fear that what description returns is ambiguous, since the concatenation is implied.

toroidal-code commented 10 years ago

what do you mean by 'ambiguous'? the choice between the second line vs the first + second line?

weswigham commented 10 years ago

Or rather the choice between the first line simply being discarded and only the second line being returned, yes.

toroidal-code commented 10 years ago

I think in that case, for stylistic reasons, it should be this, if just for removing the confusion. It's an interesting problem, and many languages have weird solutions to it.

fn description
     "This is the first line of the description this returns\n
      And this is the second"
toroidal-code commented 10 years ago

I really do want Zach's input on this before we close though.

toroidal-code commented 10 years ago

Changed my mind.

""" This is a
multiline string literal"""
"This is a string" + " concatenation."
weswigham commented 10 years ago

I totally still want to be able to use the pipeline in generalized situations, instead of reserving it for a cond or let.

Python docstring syntax for multilines is totally fine, though.

Plus for string concatenation is a widely adopted standard, I see no major reason to be different (yet).

weswigham commented 10 years ago

Oh, important question. What's the plan for handling whitespace at the start/end of a multiline literal?

"""
A short word
"""

could be identical to

"""A short word"""

Also, is preceding tabbing going to be removed, like may be expected?

let here be some code
    so I may tab in
        let str = """
            A short word
            """
toroidal-code commented 10 years ago

Newlines at the start/end are not handled. They'll still be inserted. Indents are not handled. string postprocessing should handle that.

"""
       hello
       I am indented""".strip_indents()
toroidal-code commented 10 years ago

Basically, I don't want to touch my strings in the AST. Because then, I can just say that they're collections of bytes, (not just characters) and ignore the whole unicode/ascii problem

toroidal-code commented 10 years ago

Since string interpolation is going to be a thing, we are going to be touching strings in the AST (because of macro-expansion) :( This further complicates things, and means that unless we use a lexer that can handle unicode strings, we're in a bad place. This will be a problem for once we've got a basic stable release out though. We will not be releasing a 1.0 version without Unicode support. Universal language support is important.

The current decisions of this issue are in Syntax-Literals