Document parsing of (X)HTML entities, or drop it even?

syranide commented 10 years ago

We should probably document how (X)HTML entities are parsed.

However, I can imagine dropping HTML entities instead and adopt the escaping used by JS-strings, i.e. bla \< \{ \u1234 bla. To me it would make sense in many ways:

JSX is the JavaScript-equivalent of HTML (it's not HTML), using JavaScript syntax seems preferable.
JSX explicitly disallows inline HTML in-favor of just JSXElements and JSXText, HTML entities seem a bit malplaced in that context.
It's currently <a href="&\" /> vs <a href={'&\\'} /> which is kind of awkward.

The downside of dropping HTML entities is obviously that you wouldn't be able to copy-paste HTML and it could be a mental disconnect for a lot of users. But I think it makes a lot of sense from a technical perspective.

I think it makes even more sense if you look beyond HTML. Why would you be using HTML entities for non-HTML frontends? Like iOS, QT, etc.

ghost commented 9 years ago

Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed.

gajus commented 9 years ago

The downside of dropping HTML entities is obviously that you wouldn't be able to copy-paste HTML and it could be a mental disconnect for a lot of users. But I think it makes a lot of sense from a technical perspective.

React has already chose to deviate from HTML. https://github.com/facebook/react/issues/2781

and

Third, our thinking is that JSX's primary advantage is the symmetry of matching closing tags which make code easier to read, not the direct resemblance to HTML or XML. It's convenient to copy/paste HTML directly, but other minor differences (in self-closing tags, for example) make this a losing battle and we have a HTML to JSX converter to help you anyway. Finally, to translate HTML to idiomatic React code, a fair amount of work is usually involved in breaking up the markup into components that make sense, so changing class to className is only a small part of that anyway.

from @spicyj answer https://www.quora.com/Why-do-I-have-to-use-className-instead-of-class-in-ReactJs-components-done-in-JSX/answer/Ben-Alpert

Therefore, I am in favour of dropping HTML entity support.

RReverser commented 9 years ago

React has already chose to deviate from HTML.

@gajus From HTML - yes, from XML - not so much (apart from JS injections).

gajus commented 9 years ago

Well, I am biased. I want JSX to allow template strings in JSXAttributeValue. The fate of that issue depends on whether HTML entity support is dropped or not. This is another consideration to have when deciding on this.

RReverser commented 9 years ago

Do those two braces around really mean that much to you to change two behaviors? :smile:

gajus commented 9 years ago

One is HTML entities. Whats the second?

RReverser commented 9 years ago

Template strings without braces on their own.

gajus commented 9 years ago

I think that since JSX is present in JS and that it is in essence a syntactic sugar for createElement, then it should behave in the same way, i.e.,

React.createElement(`div`, {className: `foo-${foo}`}, `bar-${bar}`);

should not be different from

<div className=`foo-${foo}`>`bar-${bar}`</div>

RReverser commented 9 years ago

Then we return to questions like numeric literals, object and array literals and so.

gajus commented 9 years ago

@RReverser Explain?

If I understand correctly, then yes, objects, strings, null and numbers (thats all there is) should be valid attribute values.

<div foo=null />
<div foo=123 />
<div foo=() => {} />
<div foo=({}) />

Does this clash with anything in the spec?

RReverser commented 9 years ago

It doesn't clash, but increases complexity for purely aesthetic reason.

gajus commented 9 years ago

That is true. But consistency/conventions lower bug count (sorry, no reference for this stats). Assuming that is true, then if the rest of the code base is using convention X (template string in this case), it would make sense if JSX supported that too.

RReverser commented 9 years ago

That arguments has two sides - on one hand, you're increasing consistency for those who work with JS for developing logic, an on another you at the same time decrease consistency and familiarity for those who develop views (HTML/XML coders).

sebmarkbage commented 9 years ago

I think that it probably only makes sense to do this if we also drop it from JSXText or drop JSXText completely, as described in #8 and #35 .

syranide commented 9 years ago

@sebmarkbage I'd say #28 is a candidate for otherwise keeping JSX as it is and being able to drop XHTML entities.

That arguments has two sides - on one hand, you're increasing consistency for those who work with JS for developing logic, an on another you at the same time decrease consistency and familiarity for those who develop views (HTML/XML coders).

IMHO the problem is that it is inconsistent, it would be fine if <a href=" " /> was the same as <a href={" "} />, which it isn't... to be honest I'm quite sure that many don't even realize this difference exists.

RReverser commented 9 years ago

to be honest I'm quite sure that many don't even realize this difference exists

Dunno, maybe, but didn't meet such people yet. Right now it's pretty balanced in sense that most realize that {...} is boundaries of JavaScript, outside of them everything works pretty much as XML, inside - as JS.

The biggest benefit of entities is that they're properly named and easy to remember. Most people know perfectly how to write   or — or © to get what they want, while very few people know corresponding hexademical codes, and googling them every time you want special character or using some external library that would just provide list of characters is not a really pleasant experience.

syranide commented 9 years ago

The biggest benefit of entities is that they're properly named and easy to remember. Most people know perfectly how to write or — or © to get what they want, while very few people know corresponding hexademical codes, and googling them every time you want special character or using some external library that would just provide list of characters is not a really pleasant experience.

\< \> \& \" seems easier to me than < > & "? Hexadecimal codes are last resort.

PS. If you want © then just write it, there's no reason to use the hexcode or HTML entity.

gajus commented 9 years ago

< > & \" seems easier to me than < > & "? Hexadecimal codes are last resort.

Was just typing that. Why bother with HTML entities at all.

RReverser commented 9 years ago

then just write it

You mean use specific keyboard layout that allows them or table character application? Not all platforms & localizations have that ability out of the box.

gajus commented 9 years ago

Copy paste from https://en.wikipedia.org/wiki/List_of_Unicode_characters.

gajus commented 9 years ago

Thats genuinely what I do when my keyboard does not have a character that I need. Since it is very rare that I would need a character thats not on my keyboard, it does not bother me. I cannot imagine anyone being bothered by that either.

RReverser commented 9 years ago

Well, I do that as well, but it's not pleasant at all, and it's not as rare as it seems - especially for examples above as non-breaking spaces, medium dashes and copyright characters. They are in fact much more often than < and > in regular text, and two others mentioned (" and &) are already perfectly supported without any kind of escaping in JSX.

gajus commented 9 years ago

While not all platforms support character maps, I imagine that every IDE/text editor has a plugin for that (vim, Sublime, WebStorm, to name a few).

gajus commented 9 years ago

Not to mention that "regular text" is rarely typed in React code. It is something you load from a database of some sort.

syranide commented 9 years ago

You mean use specific keyboard layout that allows them or table character application? Not all platforms & localizations have that ability out of the box.

http://fsymbols.com/computer/copyright/

I'm pretty sure entities aren't meant to be human-friendly first and foremost, but simply a mechanism for escaping that is charset and implementation independent.

Regardless, I don't see how this is a problem JSX should try to solve (and intentionally deviate from JS), JS makes no effort.

RReverser commented 9 years ago

While not all platforms support character maps, I imagine that every IDE/text editor has a plugin for that (vim, Sublime, WebStorm, to name a few).

So in any case - remove built-in human-friendly way for escaping, and instead force dev to google/use charmap/plugin/whatever. Degradation of DX is not something nice.

Not to mention that "regular text" is rarely typed in React code. It is something you load from a database of some sort.

Often it does - text is exactly the thing that is rather rarely generated dynamically compared to static parts on the page (user names, blog contents, numbers are but those are rather minority and have not much to do with our issue and special characters). And if we take your assumption, then this issue doesn't make sense to discuss at all.

I'm pretty sure entities aren't meant to be human-friendly first and foremost, but simply a mechanism for escaping that is charset and implementation independent.

In that case, they would be left as {. I believe names were designed specifically to be human-friendly and compatible with any locale and they serve this purpose far better than escapes in JS.

Regardless, I don't see how this is a problem JSX should try to solve (and intentionally deviate from JS), JS makes no effort.

I see, this issue becomes yet another discussion of whether JSX should be sugar as much as possible compatible with XML/HTML syntax or we should reduce it's coverage slowly moving towards JS. I don't buy the second way because it's no better than just using some kind of Hyperscript - if you want JS, you can write JS, but JSX is beautiful exactly because you can escape some of JS painful points when dealing with structures and contents such as unobvious nestings and foreign-locale escapes.

syranide commented 9 years ago

In that case, they would be left as {. I believe names were designed specifically to be human-friendly and compatible with any locale and they serve this purpose far better than escapes in JS.

No, because { is inherently meaningless without a specified charset, HTML entities are independent of charset and later translated.

I see, this issue becomes yet another discussion of whether JSX should be sugar as much as possible compatible with XML/HTML syntax or we should reduce it's coverage slowly moving towards JS. I don't buy the second way because it's no better than just using some kind of Hyperscript - if you want JS, you can write JS, but JSX is beautiful exactly because you can escape some of JS painful points when dealing with structures and contents such as unobvious nestings and foreign-locale escapes.

If you ask me, JSX should not expand to do more than is absolutely necessary, that is to introduce the concept of elements in a meaningful way. If we want to solve anything else then it should be considered independently and where possible proposed to ECMA instead so that everyone benefits and not just a partial subset of JSX content. "Foreign-locale escapes" sounds far more useful at the level of JS.

matthewwithanm commented 8 years ago

@gajus From HTML - yes, from XML - not so much (apart from JS injections).

Or namespaces or CDATA sections or comments…IMO there are a bunch of ways that it deviates.

I'm sympathetic to the DX argument, but IMO the best thing for DX is to keep the transformation as simple as possible. Also, the more similar JSX and XML are, the more confusing any deviation becomes.

If you ask me, JSX should not expand to do more than is absolutely necessary, that is to introduce the concept of elements in a meaningful way. If we want to solve anything else then it should be considered independently and where possible proposed to ECMA instead so that everyone benefits and not just a partial subset of JSX content. "Foreign-locale escapes" sounds far more useful at the level of JS.

:+1:

sebmck commented 8 years ago

If the purpose of JSX is to be agnostic to a certain target (that's not always HTML) then does it really make sense to allow HTML entities?

sebmarkbage commented 8 years ago

If we get buy in, will we have any problems making the switch? I.e. will we risk a long lived fork? The codemod should be safe.

sebmck commented 8 years ago

Do we have any stats (or anecdotal evidence) on how widely used HTML entities in JSX are?

sebmarkbage commented 8 years ago

Or backslashes...

sebmck commented 8 years ago

Oh right. I've actually broken backslashes in JSX attributes before in Babel and it took over 7 days for someone to notice and file an issue: babel/babel#2114.

NekR commented 8 years ago

I believe that entities (or other specific things) should be handled by the renderer which transforms JSX-output to HTML DOM/HTML string, but not by the transformer which transforms JSX to JSX-output.

syranide commented 8 years ago

@NekR It would then apply to all strings equally so even user input would be subject to HTML entity decoding (aside from it being a runtime cost too), you definitely do not want that.

NekR commented 8 years ago

@syranide what is user input in JSX? I did not say everything in runtime should be parsed with entities.

class EntitiesString {
  constructor(str) {
    this.str = myLibraryDoesHTMLEntytiesParsingHere(str);
  }

  toString() {
    return str;
  }
}

<div>{ new EntitiesString('&nbsp;') }</div>

syranide commented 8 years ago

...by the renderer which transforms JSX-output to HTML DOM/HTML string...

@NekR I interpreted that differently. IMHO what you are proposing is runtime decoding (which is for everyone to decide on their own) and outside this discussion about entities/escape codes in JSX source code. EDIT: That is to say, JSX needs to support escaping to some extent (like { and <), regardless of whether or not JSX will drop support for HTML entities.

NekR commented 8 years ago

@NekR I interpreted that differently.

Yes, I meanе that renderers are responsible for parsing entities. One could support EntitiesString, other don't.

. IMHO what you are proposing is runtime decoding (which is for everyone to decide on their own) and outside this discussion about entities/escape codes in JSX source code.

Of course I do not propose such decoding method here for JSX, it's implementation detail of JSX consumers. What I am saying is that entities parsing on a transpilation stage is not needed (because of runtime possibilities) and hence it's in scope of this discussion, right?

EDIT: That is to say, JSX needs to support escaping to some extent (like { and <), regardless of whether or not JSX will drop support for HTML entities.

Hmm.. <div>{ '{test}' } { '<div>' }</div> seems like it's escaped?

syranide commented 8 years ago

What I am saying is that entities parsing on a transpilation stage is not needed (because of runtime possibilities) and hence it's in scope of this discussion, right?

IMHO no, entity parsing during transpilation and runtime decoding of entities are "complementary". Runtime decoding of static source code strings in this context is inefficient and cumbersome.

Hmm.. <div>{ '{test}' } { '<div>' }</div> seems like it's escaped?

Produces React.createElement('div', null, '{test}', '<div>') and yeah it will visually render the same as it would if you had {'{test}<div>'}, but it's not the same. So yes, you can work around the problem that way (but you're inserting a JS string, not escaping in JSX). However, this all-or-nothing if you don't want to affect runtime behavior is really inconvenient, especially considering <div>{' '}</div> is very different from <div> </div> at current.

NekR commented 8 years ago

IMHO no, entity parsing during transpilation and runtime decoding of entities are "complementary". Runtime decoding of static source code strings in this context is inefficient and cumbersome.

Sorry, but topic is "Document parsing of (X)HTML entities, or drop it even?" and I am saying: Drop it. How it's not related? Runtime parsing was suggested as a solution. Some one who do not want runtime solution could write plugin which will pre-parse entities to JS escapes or something like that. But you are not even listening to me. What I am saying is that it makes sense to have JSXText to equal to simple JS string (sugar). Like these two should be equivalent: <div> </div> and <div>{' '}</div>.

Runtime decoding of static source code strings in this context is inefficient and cumbersome.

This is only problem of React since it's doing re-render on every move. I use JSX in a different way and it's perfectly fine for me.

So yes, you can work around the problem that way (but you're inserting a JS string, not escaping in JSX). However, this all-or-nothing if you don't want to affect runtime behavior is really inconvenient, especially considering <div>{' '}</div> is very different from <div> </div> at current.

Why we need to do work arounds or escape JSX? Just have JS string everywhere. I do not see any difference here except that transpiration entities parsing is benefit for React.

P.S. Interesting that you made this repository public and asked for feedback from non-React implementations and when people came here with their opinions, you say: "This is not related". Just make it private repository and then no problem with "not related".

syranide commented 8 years ago

P.S. Interesting that you made this repository public and asked for feedback from non-React implementations and when people came here with their opinions, you say: "This is not related". Just make it private repository and then no problem with "not related".

@NekR I'm only one collaborator of many, these are my opinions. Feel free to refute them, but there are many things to consider. If I didn't care about your opinion I wouldn't have responded.

Sorry, but topic is "Document parsing of (X)HTML entities, or drop it even?" and I am saying: Drop it. How it's not related? Runtime parsing was suggested as a solution.

Decoding at compile-time (source code and static strings) and run-time (dynamic strings) can both co-exist and make sense. In the context of language design, run-time decoding being possible is not an argument against a syntax feature, nor vice versa. They are solutions to different problems.

Yes, we both agree that HTML entities should be dropped, that's not what I objected to. I undoubtedly think that is the way forward, but the holes left behind by dropping HTML entities still needs to be considered, runtime decoding is not it.

NekR commented 8 years ago

In the context of language design, run-time decoding being possible is not an argument against a syntax feature, nor vice versa.

I saw many such arguments and decisions in TC-39, but okay, you do not accepts this as argument then nevermind.

runtime decoding is not it.

Why? Where is a big performance problem with it except of React contact re-render?

dantman commented 8 years ago

Personally I don't think the DX argument is valid. And that is not through an expectation of everyone using character maps, etc...

JSX is JavaScript and it doesn't really make sense that the solution when writing JS+JSX to "I can't type © with my keyboard" is "You can use © in JSX strings but you're SOL in every other part of the JS". Which of course leads to a mess like:

<Foo
    label="I can &copy; here"
    legal={__('This site \u00A9 2016 Acme Media Inc.')} />

Same code. But you can use © in one part of the JSX and you can't in the other because you have something – which doesn't have to be i18n, it can be collection processing or anything else – that requires that one of the strings be part of JS space and not JSX space.

If this is a problem, it is a problem universal to JS and not one that should have a JSX-only fix.

Rather I think the solution is to embrace the fact we're writing JS and fix this with JS. Specifically, given #25 I think the solution to "I can't type © with my keyboard and don't want to use a character map, C&P, or use some other tooling" is this.

var ent = require('character-entities');

<Foo
    label=`I can ${ent.copy} here`
    legal={_(`This site ${ent.copy} 2016 Acme Media Inc.`)} />

facebook / jsx

Document parsing of (X)HTML entities, or drop it even? #4