facebook / jsx

The JSX specification is a XML-like syntax extension to ECMAScript.
http://facebook.github.io/jsx/
1.96k stars 132 forks source link

Add TemplateLiteral support to JSXAttribute. #132

Closed tolmasky closed 2 years ago

tolmasky commented 2 years ago

Given the comments in this Babel thread, here is my attempt at resolving the template string issue. This PR supports the "decode strings in the normal JavaScript manner", and my reasoning is below (there doesn't really seem to be a place to put the reasoning in the spec itself, although I have tried to clarify the encoding issues for future spec readers):

  1. As stated above, this spec change has template string attributes behave identically to their JavaScript counter-parts with regard to string encoding. In other words, <tag name = `[value]`/> is identical to writing <tag name = { `[value]` } />.
  2. The main reasoning here is that the primary purpose of having <tag name = "[value]"/> use the HTML-style string encoding is because there is a matching syntax production in HTML that we want to emulate. As a simple example, copying <tag name = "&amp;"> should not produce two surprisingly different results depending on whether it appears in HTML or JSX.
  3. However, there is no such matching production in HTML that we are attempting to support with our template strings. Short of having the template string incorporate the delimiting back-ticks and ignore any internal template elements, it will never match the expected behavior of HTML. (That is to say, in HTML <tag name = `[value]`> is like writing <tag name = "`[value]`">, with the additional complications since you don't have quote boundaries to clearly delimiter when the value ends, etc.).
  4. Template strings that provide a good escape hatch from the potentially surprising html entity encoding process to give you normal JS encoding. When used without any internal template elements, you essentially get JSX attributes that encode the same as JavaScript without the additional curly brace noise.

As @sebmarkbage mentioned in the above comment, it appears the sentiment is that we wish JSX simply used JS encoding across the board, so given that template strings are a new production that pose no backwards compatibility issues, it seems like a good opportunity to introduce a way to get this behavior without introducing breaking changes. If this is a direction we want JSX to go more generally, we also would have the opportunity to start telling people to use template strings across the board as a best practice, to clearly communicate in their code that they are using a JS version of strings and not an HTML one. Arguably, you could someday even deprecate double and single quoted attributes in favor of these and have a similar "JSX is not HTML" message when used to how it currently says "JSX is not XML" when namespaced tags are used. All these options are of course outside the scope of this one addition, but I bring them up to demonstrate that this simple addition opens up a lot of flexibility for creating smoother transitional paths to more breaking changes that have been discussed here for years.

NOTES

I've made a few other changes here to hopefully help people in the future:

  1. I've linked Syntax productions to their ECMAScript pages.
  2. I've tried to do the same with ESTree node elements (although outside the code block since you can't link there).
  3. I've attempted to explicitly note the HTML attribute encoding for JSXSingleStringCharacter and JSXDoubleStringCharacter as this appears to currently be absent from the spec (please correct me if I'm wrong here). This also through omission implies that the TemplateLiteral syntax production behaves identically to the ECMAScript one.
  4. I updated the @babel/parser link.

Closes #25 Closes #112

Huxpro commented 2 years ago

@Kingwl Just wanna share some technical fun facts.

Quoted strings in JSX behave differently with either JS or HTML: It supports multi-line (which JS string literal don't), but the line breaks are always trimmed to a single whitespace (which HTML attribute values don't). I haven't found the historical conversation yet.

(And In my memory, there's some minor different for escape sequence too. Sorry I'm not pretty sure.)

And from what I knew, the string literal and untagged template literal have the same escape sequence. Notably, all string literals and template literal refer to the same Escape Sequence back in ES2017. However, a template literal revision is later amended to ES2018 to relax the escape sequence restriction of only the tagged template literal to better support DSL usages.

Kingwl commented 2 years ago

@Huxpro Thanks for the sharing.

only the tagged template literal

Oh yes. That's it. Thanks.

wooorm commented 2 years ago

but the line breaks are always trimmed to a single whitespace (which HTML attribute values don't). I haven't found the historical conversation yet. — @Huxpro

I mentioned it here: https://github.com/facebook/jsx/pull/132#issuecomment-1028904740. Babel trims. TypeScript doesn’t. So It could be considered a bug in Babel that they trim.

As I noted in that comment though, it could be useful to specify this trimming: it would allow Prettier and other tools to improve the printing of attributes, because it can dedent and indent to make things look good while it doesn’t make a syntactical difference.


Addendum: Note that the behavior of character references and whitespace “originate” from JSXText:

<a b="c &amp; 
      d">
  e &amp; 
     f
</a>

Babel:

React.createElement("a", {b: "c &  d"}, "e & f")

TS:

React.createElement("a", { b: "c & \n      d" }, "e & f")

And that “braces can be used to enter JavaScript” if you want whitespace / character references:

<a>{`
  e &amp; 
     f
`}</a>

->

React.createElement("a", null, `
  e &amp; 
     f
`)

I think it makes a lot of sense that double/single quoted attributes work the same as text inside elements: that whitespace is trimmed and that character references work.

tolmasky commented 2 years ago

Everything will change if this pull request is merged.

That's not true, the amount of "confusion" will remain the same. In fact, I will argue that the amount of accidental surprising behavior will decrease. Why? Because you are less likely to manually convert <div value = `&amp;`> to <div value = "&amp;"> than you are to convert <div value = { `&amp;` }> to <div value = "&amp;">. There's no reason to convert `&amp;` to "&amp;". It's not like non-replacement template strings are "more expensive" in JS. It would be a purely a stylistic choice at that point, and if you had already been using replacements beforehand, there's a good reason to just keep it as a template string in case you want to use replacements in the future. You pay no cost. However, it does seem like a bunch of unnecessary extra text to see { `&amp` }, so you'd probably be even more inclined to make some change to it, leading to the dreaded "&amp;".

So, given the fact that the entity feature is completely undocumented, this PR would lead to equivalent "opportunity" for confusion to before, but there is a strong case for it resulting in *fewer& instances of that confusion.

So, Why not change the existed jsx string's?

This PR allows you to develop a linter rule that enforces always using template literals to get this behavior today without it being a breaking change. If you always use template literals you can completely escape the whole entity mess, and it's the only way to get this anytime soon. It is probably the only way to get this behavior in the next 5 years.

Please take a moment to read through the existing issues and PRs. The choice is not between this or a great JSX 2.0. The choice is this or nothing. I wish we could change all strings too! So does the original maintainer. We've been wishing it for 5 years. It's not going to happen. This repository can't even come together to make the completely uncontroversial decision to at least document the existing behavior to spare its poor users the pain of having to discover this feature on their own. The oldest open issue is 8 years old and just asks about documenting this existing behavior. But like clockwork, every issue or PR in this repository devolves into fantasies about what some new JSX might look like, and eventually everyone has to move on with their lives and forgets about the issue and it becomes abandoned. This is not an exaggeration, just look through the issues in this repository. It's OK to not like the template literals proposal, but don't have the reason be the mistaken belief that we're going to get "the real" fix instead. We're not.

magic-akari commented 2 years ago

The choice is not between this or a great JSX 2.0. The choice is this or nothing.

<div foo="&amp;" bar=`&amp;` baz={"&amp;"} />

Great JSX2.0(probably never arriving)

React.createElement("div", {
  foo: "&amp;",
  bar: `&amp;`,
  baz: "&amp;"
});

Nothing

React.createElement("div", {
  foo: "&",
  // no bar
  baz: "&amp;",
});

TemplateLiterial Patch

React.createElement("div", {
  foo: "&",
  bar: `&amp;`,
  baz: "&amp;"
});

Honestly speaking, if I can't choose Great JSX 2.0, I choose Nothing.

tolmasky commented 2 years ago

Since we agree we're not discussing JSX 2.0 @magic-akari (which for the record, even if we did get it, would not necessarily do anything with HTML entities, that in itself is also an assumption, so it's 2 assumptions), let's stop adding it to our comparisons. Let's stick to the two concrete examples we know, and not insert a third fantasy example:

Nothing

React.createElement("div", {
  foo: "&",
  // no bar
  baz: "&amp;",
});

TemplateLiterial Patch

React.createElement("div", {
  foo: "&",
  bar: `&amp;`,
  baz: "&amp;"
});

They are equally "inconsistent". You may prefer one over the other, but no one is forcing you to use template literals, so I don't see the problem. I can appreciate the abstract concern you have with it, but if we talk about the practical implications for users, it is simply not realistic that this introduces "more instances" of running into the HTML entity disparity. Users are not currently foregoing from using template literals because they can't inline them directly, and as such, whenever they switch from a template literal to a normal string they run into the exact scenario you are concerned with.

As I mentioned in my previous comment, the best way to get a "no entities purely consistent" behavior is to accept this PR and then use a linter to enforce only using template literals:

<div value = "hi"> // error, use <div value = `hi`> instead to avoid accidentally HTML entity handling

This is what I reference in my original summary of providing a "best practices" path today to start transitioning the community to a no-html-entities future. It would be far more likely to succeed than trying to convince everyone to wrap all their non-template-strings in curly braces, which seems like a lot of boiler plate for nothing.

Jack-Works commented 2 years ago

I support never converting HTML entity encoding in the property field for the following reason. Those reasons are based on my personal experience.

  1. I believe most people don't know this feature.
  2. I believe people don't copy-paste HTML into JSX.
  3. I believe most people think JSX is closer to JavaScript than HTML/XML.
  4. It is not impossible to make a breaking change and migrate.

[2]: Please be aware of htmlFor, className and self-closing HTML (<img src="">). They cannot be pasted into JSX without modifying.

[3]: <></> or <Namespace.Component /> is far away from HTML/XML. XML-styled JSX <namespace:component /> is rarely used. (babel throws by default when using XML-style namespace JSX IIRC).

And new programmers to the front-end are almost always starting from modern frameworks.

[4]: Since the following code will not have entity encoding, that means this breaking change is statically analyzable and auto-fixable. I believe it's easy to run a migrate script on the codebase and fix all use cases.

Also, there is no problem with the ecosystem because library authors ship compiled JSX, not the JSX source. Therefore upgrading the toolchain in the application won't affect the interpretation of the dependencies. (Thus this problem is different from define/set on the class fields).

return <img alt={"&" + props.alt} />

I think maybe it's OK to have HTML entities encoding in the JSXText.

tolmasky commented 2 years ago

I support never converting HTML entity encoding in the property field for the following reason. Those reasons are based on my personal experience.

Hi @Jack-Works, I am begging us to not derail this. EVERYONE agrees with your position. NO ONE LIKES the HTML entity thing. Not even that, no one even fully understands it, since it's not properly documented and is only a subset of HTML entities. The original maintainer wishes he had never added it. You are 100% correct that we should get rid of it -- but we can't do that without shipping a breaking version, and this project has a hard enough time shipping backwards compatible features as it is. Scratch that, this project can't even get consensus about shipping updates to the documentation that would at least document the current behavior.

If everyone agrees that HTML entities are bad, then the only question would be "why add them in more places?" That's what's trying to be decided here, should template literals continue this behavior few people know about and no one likes once they find out about, only now in a feature that can't even plausibly offer a reason to do it like the original one did? No. Of course not. Let's add this simple feature and make it follow the pattern JS instead of HTML, such that <div value = `template`> is consistent with <div value = { `template` } >, and such that we can offer a transitional path to getting rid of HTML entities everywhere someday.

tolmasky commented 2 years ago

Hi @Huxpro, @sebmarkbage, and @acdlite, this PR has now reached its "final form" of JSX PRs and GitHub issues. Within this very thread, we have had multiple people come in and say "I have a great idea, why not just remove HTML entities everywhere?" It's clear that it has reached sufficient length where no one is reading any of the preceding comments anymore, and I don't have the energy to repeatedly explain the same answer over and over. Scratch that, to explain the same question we are trying to answer over and over again. Please feel free to close the PR. I've been using my own private fork with template literals for a few months now, so this PR wouldn't really provide me with anything, I simply thought it might be nice to be able to offer this to people who don't have the time or expertise to manually edit Babel themselves. I'd close the PR myself, but given that there are other people that had some vested interest in it I consider it rude to do that, but don't mind (and encourage) you to do it. I think it might be most fitting for @sebmarkbage to come and close it actually. I wish he had just closed it to begin with. I know @acdlite said you guys are working on a proper response, but please don't feel pressured to do this for my sake anymore. This has all been a colossal waste of time, and I apologize for not heeding the warnings given to me originally, and I do want to sincerely apologize to you @Huxpro for having to deal with this as your initial introduction into Open Source. I do wish we would have gotten to meet under different circumstances, and I certainly harbor no ill will towards you at all and hope you also don't towards me. This repo is just cursed.

wooorm commented 2 years ago

As I mentioned in my previous comment, the best way to get a "no entities purely consistent" behavior is to accept this PR and then use a linter to enforce only using template literals:

I see this as bad. If such a rule would land in eslint-config-airbnb, eslint-plugin-react, etc, tons of newcomers would change what I see as readable JSX (double quotes) into backticks. Just to appease the tools. I don’t see it as “better”.


[2]: Please be aware of htmlFor, className [...]

This is a React-specific choice. They’re not dictated by JSX. Preact for one, but also others, don’t have this rule.

[4]: that means this breaking change is statically analyzable and auto-fixable.

How could tools differentiate between these: <a b="MIT &copy; Some Person" title="you can type &copy; in HTML to get a copyright sign" />). Should they both turn into a copyright sign?


If everyone agrees that HTML entities are bad

I personally don’t think they’re bad.

I’m :+1: on character references in double/single quoted attribute values. Because they also occur in JSXText. I’m open to removing them if they’d be removed from JSXText too. However, those are used a lot in the wild (e.g. Google “JSX nbsp”).

Removing them from one but not the other, IMO, makes it harder to explain JSX. JSX isn’t pretty, but consistency helps. I think removing character references from both is acceptable, but it makes it impossible to use both double and single quotes inside an attribute value, unless JS escapes (\") were added too. Allowing that, and in JSXText, would also break real content, unfortunately.

For whitespace in attribute values, I’d prefer that to follow Babel, which is the same as JSXText. Due to consistency and to help formatters.

Broadly, I hope that JSX remains a small language that has JS as an “escape hatch”. That it’s a small weird mix between HTML and JS, and with braces you get actual JS. I can see some value in a big JSX 2.0 rewrite. But it may not happen. I think I’m against adding backticked strings with JS semantics to JSX as currently though, as its inconsistent (I don’t have a vote though, this is a Facebook project, but I as the MDX maintainer see this as a net-negative, it’s just an opinion).

dantman commented 2 years ago

[4]: that means this breaking change is statically analyzable and auto-fixable.

How could tools differentiate between these: <a b="MIT &copy; Some Person" title="you can type &copy; in HTML to get a copyright sign" />). Should they both turn into a copyright sign?

Yes, both of them should turn into a copyright sign.

As you can see from the transpiled code, the html entity -> unicode character conversion is done by the JSX compiler. It's not done at runtime with any difference between types of properties. ALL of them turn into (escape sequences) unicode characters before any code is run. Babel playground TypeScript playground

Huxpro commented 2 years ago

First, I want to thank everyone that has been involved in this long conversation. I see that this change (and many other issues and PRs) had been debated for a long time and everyone feels more or less frustrated about the status quo of this repo. So my primary goal as the new repo maintainer is to make this repo healthy again ❤️. I’ll elaborate more on the “next step” section below.

This Proposal

The team understand this proposal is about adding template literal to JSX orthogonally with the original HTML-ish encoding of other JSX “strings” and this proposal alone is not a breaking change.

The team also shared many concerns that had been raised up by many others in this thread (THANK YOU!), such as the tension between HTML vs JavaScript encodings, and whether or not we should push the boundaries to other expressions so we can deliver a more cohesive mindset. We also observed that a consensus can't be easily reached just yet.

Our conclusion therefore is that we are hesitant to move forward on making this change. It’s not so much about this proposal itself, but its downstream effects. We are not convinced that it’s worth the churn for existing users. We will still be interested in incorporating this proposal as part of a larger and cohesive changes such as JSX 2.0.

Next Step

Repo Health

I want to thank @tolmasky for his valuable callouts on how we could improve from this regard. I’ll start with drafting a set of new guidelines which should clarify the requirements and expectations on different kinds of contributions (e.g bug reports; editorial and normative changes to the current JSX spec; and feature requests and discussion that may go to JSX 2.0).

I’ll also start with tagging and fixing stale issues and PRs so they are properly attributed.

Moving Forward

To re-gain a solid foundation to constructively move JSX forward, we plan to correctly document all the existing semantics in the spec to align with existing implementations such as Babel and TypeScript. Going forward, we want to minimize the inconsistencies between different implementations, and ensure we are in the same page whenever a change is needed to be made to the spec.

To get the ball rolling, I’ve made 2 PRs:

  1. https://github.com/facebook/jsx/pull/135 to revamp the spec. This is already merged. Check out http://facebook.github.io/jsx/ to see how the new spec website looks and feels!
  2. 136 . I encourage everyone that has been interested in the current HTML character reference behaviors to continue the discussion there.

Regarding other topics e.g. white spaces inconsistencies, please feel free to file another issue to continue the discussion there.

Thank you 💙

Kingwl commented 1 year ago

Any update after 1 year later?