facebook / jsx

The JSX specification is a XML-like syntax extension to ECMAScript.
http://facebook.github.io/jsx/
1.96k stars 133 forks source link

Non-quoted multi-line JS string literal parsing instead of JSX text #28

Open syranide opened 9 years ago

syranide commented 9 years ago

I propose the following: Trim leading spaces and tabs after newlines (indentation) and then apply regular string literal parsing (but over multiple lines and without quotes). It may be advisable to trim tailing spaces and tabs too to avoid source code incompatibilities with editors auto-trimming on save.

There are lots of benefits to this:

  1. (X)HTML-entities are now a thing of the past. JSX attribute strings (HTML) and JSX expression container strings (JS) are now parsed the same way.
  2. You can write complex strings without breaking texts up into separate children (foo{'}'}bar) or having to resort to using unintuitive HTML-entities (foo}bar), just escape with \.
  3. We can intuitively and easily control whitespace by simply escaping \<space> (\<eol> gives you a space too).
  4. A single-line JSX text behaves identically to its counter-part JS string literal.

The two practical issues I see are:

  1. &nbsp; is common and \xA0 is not an overly obvious escape code. This applies if you're dealing with JS anyway, but most are probably only used to typing it as a HTML-entity inside JSX. It's a trade-off (perhaps editors will catch-up and actually render a hint for literal nbsp chars with a little persuasion too).
  2. It makes the most technical sense to drop the current implicit space between newlines (and I'm all for it) but it could make it seem a bit alien to some users, implicit space can be kept and will then have to be omitted if the char before the newline is a white-space char.
<>  foo  </>
=> '  foo  '
<>
  foo
</>
=> 'foo'
<>
\  foo \
</>
=> '  foo  '
<>
\ \ foo\ \
</>
=> '  foo  '
<>
  foo
  bar
</>
=> 'foobar'
<>
  foo\
  bar
</>
=> 'foo bar'
<>
  foo\nbar
</>
=> 'foo\nbar'
<>
  foo\n
  bar
</>
=> 'foo\nbar' (actual newline)
<>
  foo\<\{
  bar
</>
=> 'foo<{bar'

What's not to like? PS. It should be noted that this is easily non-destructively codemod:able too.

cc @sebmarkbage @jeffmo @zpao @spicyj

sebmarkbage commented 9 years ago

Is it true that if we implement this, is it true that the parser would accept any HTML content (i.e. legacy content) as correct?

The opposite is not true because \< and \{ cannot be valid in HTML content.

I think that's the most pervasive arguments since it allows a simple syntax highlighter to deal with either model. Any HTML content transpiler would simply be a subset in acceptable input.

Of course, we should do the same for attributes then.

Is it possible to repurpose the same whitespace logic from multiline string literals in ES6 or would it not make sense?

syranide commented 9 years ago

Is it true that if we implement this, is it true that the parser would accept any HTML content (i.e. legacy content) as correct? The opposite is not true because \< and \{ cannot be valid in HTML content.

I think that's the most pervasive arguments since it allows a simple syntax highlighter to deal with either model. Any HTML content transpiler would simply be a subset in acceptable input.

I'm not 100% sure what you mean here.

I'm not aware of any JS/HTML/XML-parsers out there that play even remotely nice with JSX code (not that I have tested many). There's enough flexibility in JSX that they're all easily confused by many perfectly normal JSX code practices and incorrectly highlight the remainder of the source. Parsing JSX as HTML works for the most simplistic examples but nothing else, or am I missing something?

Of course, we should do the same for attributes then.

Yeah, that was implied :)

Is it possible to repurpose the same whitespace logic from multiline string literals in ES6 or would it not make sense?

AFAIK ES6 multiline string literals copy text content verbatim, including leading and trailing whitespace and even LR/CRLF as-is, which leads to bad practices like this for certain use-cases. The whitespace rules I defined above are the same used by JSX today (but managing whitespace is now easy and uninstrusive).

The problem as I see it is whitespace cannot be optimized away post-process. You also prevent the developer from freely indenting and structuring text content and you end up with the horrible HTML line-stuffing practices in scenarios where newline/indentation introduced unintended whitespace.

The rules I've defined above allow any string value to be easily represented at any intendentation and over any number of lines, which HTML nor ES6 multiline strings are unable to.

I would quote @jeffmo if I still had his message, following my initial whitespace PR for JSX he was surprised at just how much unintended whitespace would previously end up in the output.


Just to illustrate, as parsed by HTML (and ES6 multiline string literal):

<div>
  Foo

  Bar
</div>
=> '\n  Foo\n\n  Bar\n' (OR)
=> '\r\n  Foo\r\n\r\n  Bar\r\n' (OR)
=> '\n\tFoo\n\n\tBar\n' (OR)
=> '\r\n\tFoo\r\n\r\n\tBar\r\n'

That's probably not what you meant right? Again, it cannot be optimized away post-process.

syranide commented 9 years ago

I could put together a working PR quite easily if that would be of interest, to get a feel for how it would work in practice if it's at all interesting.

ghost commented 9 years ago

Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed.

mlunoe commented 8 years ago

Is there an update on this?