Support for code blocks

mrcdk commented 7 years ago

I'm not happy with the naming of this feature but I hope we can discuss it here.

Rendered version here

Pauan commented 7 years ago

I like it, but it seems to me that using --macro makes it a lot clunkier to use (and creates the possibility of name conflicts).

I think it would be better if it used an API like this:

import haxe.macro.Expr;

class XMLBuilder {
    public static macro function xml(block: CodeBlock): Expr {
        // macro code here
    }
}

import XMLBuilder.xml;

class Foo {
    public static var foo = ```xml
      <XML CODE HERE>
    ```;
}

class Foo {
    public static var foo = ```XMLBuilder.xml
      <XML CODE HERE>
    ```;
}

Any macro which accepts a haxe.macro.Expr.CodeBlock as its first (and only) argument would be allowed as a tag in ```

This is similar to how build macros return Array<haxe.macro.Expr.Field>

This means that you don't need to register anything, instead the part immediately after ``` is parsed as an identifier, so it uses regular identifier lookup rules, so using XMLBuilder.xml works.

P.S. I think haxe.macro.Expr.TemplateString would be a better name than haxe.macro.Expr.CodeBlock

nadako commented 7 years ago

I'm not sure how I feel about this. This is basically a HEREDOC syntax, which I always find quite weird to be honest, so I'm more against this, I guess...

A nice feature here is that you don't have to do the escaping, this is a plus for doing DSLs, but OTOH, if we talk JSX use-case I'm not sure HEREDOC would look good here if we consider reentrancy, e.g.

```jsx <ul>{names.map(name -> ```jsx <li>{name}</li>```}</ul>```

I don't think this looks much better than the current solution:

jsx("<ul>{names.map(name -> jsx("<li>{name}</li>")}</ul>")

I have some concerns about the proposal too.

Firstly I don't really understand the "motivation" section:

Make the compiler aware about the content of the block, something that wouldn't be possible with a macro

Without macro processing, the proposed syntax is still an arbitrary string, from the compiler standpoint, so there's no difference.

an IDE can use this information for, if supported, add highlighting

Well, an IDE still has to know about the tag, so this wouldn't be much different from supporting jsx(...) calls, would it? Same with external linter.

auto completion

That can already be supported with macros if you do correct position bookkeeping, as shown by linked Juraj's gif:

Secondly, I agree with @Pauan that using module-local imported names would work better than a global registry. And if you don't want to do the imports everywhere, you just can use the import.hx feature.

Thirdly, the part about optional tag makes syntax ambigous, I think: it's not clear how ```hi``` should be treated. By your proposal it's a string tagged with hi, but that's not what you'd expect here. The solution, I guess, would be to add space after the third backtick, but that would look weird...

Pauan commented 7 years ago

@nadako

This is basically a HEREDOC syntax, which I always find quite weird to be honest

It serves the same purpose as HEREDOC syntax, but it's a lot nicer than HEREDOC syntax.

Python uses triple """ and ''' for strings, and it works really well in some situations (e.g. when writing documentation).

Firstly I don't really understand the "motivation" section:

I don't understand it either, since an IDE needs out-of-band information in both cases (either the global tag or the macro identifier).

Thirdly, the part about optional tag makes syntax ambigous

That's a good point. Rather than allowing for no tags, instead there could be a built-in str tag, which returns the template string as-is:

class Foo {
    public static var foo = ```str
        raw string stuff here
    ```;
}

A nice feature here is that you don't have to do the escaping

Yes, because Haxe already supports multi-line strings + interpolation, the only benefit is avoiding the \ escapes.

So it's only useful in situations where " or ' are common (e.g. attributes in HTML / XML, inserting another program as a string, custom DSLs, etc.)

If the proposal is changed so that \ is ignored inside ``` then it would be even more useful, since it would allow you to insert \ inside the string without needing to use \\

Your example looks better if you spread it across multiple lines and use a helper method:

class Foo {
    private function renderItem(name: String) {
        return ```jsx
            <li>{name}</li>
        ```;
    }

    public function render() {
        return ```jsx
            <ul class="foo" style="background-color: green;">
                {names.map(renderItem)}
            </ul>
        ```;
    }
}

I don't think this proposal is intended for small simple strings, it's more intended for large multi-line strings, so I think the above is a more realistic example.

Another interesting use case would be to insert SQL tables/syntax into Haxe code:

class Foo {
    private function make() {
        return ```sql
            CREATE TABLE IF NOT EXISTS foo (
                id    serial primary key,
                name  text
            );
        ```;
    }

    public function query(name: String) {
        return ```sql
            SELECT name FROM foo WHERE name = ${name};
        ```;
    }
}

This is an example where ' and " are commonly used, so it's inconvenient having to escape them.

Another example is that Python uses triple """ for doc strings, and we can use ``` for the same purpose:

class Foo {
    @:doc(```str
        documentation goes here
    ```)
    public function foo() {}
}

Python also has a really cool feature where you can insert code examples into doc strings, and it will then automatically run them as unit tests:

class Foo {
    @:doc(```str
        blah blah blah documentation

        >>> var x = new Foo();
        ... x.foo();
        10

        blah blah blah more documentation
    ```)
    public function foo(): Int {
        return 10;
    }
}

You use >>> to insert code into the doc string, that code is then run as a unit test.

So in the above example it will run var x = new Foo(); x.foo(); and then verify that the return value is 10

This gives a standard way to give code examples in the documentation, and because they're run as unit tests it forces the programmer to keep the code examples up-to-date.

Obviously having to escape ' and " and \ would be really awful in this situation.

Another example is RegExps:

class Foo {
    public function foo() {
        return ```re foo+\.([a-z]+)```.test("foo");
    }
}

Not needing to escape " or ' or \ would be very nice.

I know we already have ~/.../ syntax for RegExps, but I mean for a custom RegExp library, or for a library which adds new RegExp features that Haxe doesn't support, etc.

Another example is embedding BNF into Haxe:

class Foo {
    public static parser = ```bnf
         <postal-address> ::= <name-part> <street-address> <zip-part>

              <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> 
                            | <personal-part> <name-part>

          <personal-part> ::= <initial> "." | <first-name>

         <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

               <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>

        <opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
            <opt-apt-num> ::= <apt-num> | ""
   ```;
}

Basically there's lots of different areas where it's useful to embed big chunks of strings without needing to escape ' or " or \.

Also, just FYI, I do like this proposal, but not a lot. I wouldn't mind having it in Haxe, but I can also do without it as well (since I don't use XML / JSX / SQL / BNF / etc.)

I can understand why other people want it though.

Pauan commented 7 years ago

Oh, and speaking of escaping, there will be situations where you want to include $ in the string.

So rather than using ``` I think it would be better if it used ''' and """:

""" would include everything as-is, so you never need to escape ", $, or \
''' would include everything as-is, except it allows for interpolation with $, so you never need to escape ' or \

ncannasse commented 7 years ago

For the most generic solution to embedding DSLs with custom parsing, you either want to be able to introduce new lexemes into Haxe syntax (which is hard to solve correctly) or have some delimiters that can then be parsed using macros (which is more the Haxe way of doing things).

However, as @nadako explained, you want to support mixing of your DSL with Haxe code, and then either you need to espace your delimiters (which is annoying) or you need to introduce a specific syntax for Haxe code embedding (such as ${} blocks in our interpolated strings) which goes against the principle of the DSL having its own custom syntax.

Another way of dealing with this is to let the macro do the parsing of Haxe parts, but then you need a way to make sure the whole code block is correctly parsed even if it contains other delimiters. But that would not work with delimiters such triple backquotes because it would stop parsing at the first one within the rentrant part:

foo(```some code ${```reentrant part```})
                     |-> parsing stops here without explicit meaning for ${}

So the solution is to use opening/closing delimiters, and making sure that you parse until all of them has been closed.

This can be one of the following pairs : () {} [] <>

For instance we could define it as <[ .... ]>:

foo(<[some code ${<[reentrant part]>}]>)

Here, the parser knows how to parse the whole string, and the foo syntax knows how to parse the sub haxe parts as well.

The only issue left is if the delimiter also appear as a lexeme in the DSL itself, in which case we need to be able to escape it.

mrcdk commented 7 years ago

I agree that the motivation point is lacking, I didn't want to imply it was because JSX or any other markup language because it's not the full story. I actually got this idea while writing some GLSL and thinking how nice it would be to have some kind of code highlighting. It got me looking how to do that and if it was possible and found this open vscode issue https://github.com/Microsoft/vscode/issues/1751 While it's not currently possible on vscode, the last comments that talk about how the language server should be the one in charge of knowing which languages are injected gave me the "compiler awareness" idea. It could be done by the language server alone but I thought the Haxe compiler would do a better job so other IDEs could benefit from it. Of course, if the tag is something like my_xml the IDE wouldn't know what to do with it so maybe we could come with some standard when registering the tag.

--macro registerCodeBlock("my_xml", "text/xml", Macros.processXmlMacro)

The idea of using a --macro came while thinking how it would work in a library. I wanted to make it as transparent as possible for the user and libraries can export extra parameters with extraParams.hxml I feel that if we choose the option of importing it manually the user will get confused about how to use it: the user imports a function that doesn't work as a function. With that method we also miss any possible extensions like the last point. Now, importing it manually could work if we choose to change the tokens I could see it working if we choose to use ES6 tagged template literals

import Macros.jsx;

class Test {
  public static function main() {
    var a = jsx`...`;
  }
}

Which will also look nicer on oneliners:

jsx`<ul>${names.map(name -> jsx`<li>{name}</li>`}</ul>`;

And we avoid the problems with how would ```hi``` get parsed.

I proposed this pattern as an alternative in the alternatives section with the only drawback that, coming from the JavaScript world, people may think that it works at runtime too and may create confusion.

I'm not sure how this would work with my first point but maybe someone has a better idea.

I also agree that this isn't a must-have feature. It would be really nice to have tho.

Off-topic: @Pauan there is a library for doc-tests in Haxe http://lib.haxe.org/p/haxe-doctest/

back2dos commented 7 years ago

I agree with @Pauan that the interpretation of syntax should be driven by the normal scoped identifier resolution rules, rather than global handlers. You can still use import.hx to apply it to a whole code base.

The main issue I see with the proposed syntax is that it can't be nested in itself, which is to some extent what Nicolas is getting at with his comments on delimiters having to be espaced. The simplest example of why it goes wrong:

jsx`<ul>${names.map(name -> jsx`<li>{name}</li>`}</ul>`;

Will fail with "unexpected <" after:

jsx`<ul>${names.map(name -> jsx`

(it will do this whether you use one back tick or three or 42 or whatever)

As mentioned on twitter, I've been cooking up a little something. You'll find that approach not to exhibit the aforementioned limitations (I'm sure it has its own - feel free to comment on them there). We've had some discussions about it so far, in particular about making a reentrant JSX parser, and why this will become perfectly valid Haxe syntax:

<ul>{names.map(name -> <li>{name}</li>)}</ul>

In conjunction with https://github.com/HaxeFoundation/haxe/issues/6479 parsing the above becomes a relatively straight forward endeavor - the remaining challenge is parsing a suitable dialect of XML and generating some expressions from this (which has been done already at least once ;)).

porfirioribeiro commented 7 years ago

Why not follow the same convention as in ES6 for Tagged Template Literals?

https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Template_literals#Tagged_template_literals

back2dos commented 7 years ago

Why not follow the same convention as in ES6 for Tagged Template Literals?

I literally explained that in the comment above yours ;)

mrcdk commented 7 years ago

Closing this PR in favour of @back2dos https://github.com/HaxeFoundation/haxe-evolution/pull/26 PR

HaxeFoundation / haxe-evolution

Support for code blocks #25