[Compiler]: All comments removed from source code

swissspidy commented 1 month ago

What kind of issue is this?

[ ] React Compiler core (the JS output is incorrect, or your app works incorrectly after optimization)
[X] babel-plugin-react-compiler (build issue installing or using the Babel plugin)
[ ] eslint-plugin-react-compiler (build issue installing or using the eslint plugin)
[ ] react-compiler-healthcheck (build issue installing or using the healthcheck script)

Link to repro

https://playground.react.dev/#N4Igzg9grgTgxgUxALhAgHgBwjALgAgBMEAzAQygBsCSoA7OXASwjvwFkBPAQU0wAoAlPmAAdNvhgJcsNgB4AfOPwr8cgMIQAtloR1cYblIBKCLRABuCQvgD0SiSo3bd+w1IAKUsAhhWb9spq9gDc4gC+4uK0DMys+Jo6egZGCKbm-kIiQXYAVPi4MGR0YJRkuDhgyPgApFX4UD4wdGS6AHT4ACoAFkxg+HAuyfgA7kyUlPgARgiSZpbW+Lm2QYMlBLgYBAC8+AD6e-wA5AASCBMQtWBHgmF0Qba2Xb39g0n6o+OTZJSQ07NSDLWNrKIJSGTNNSEJgWBTATboXDhOS2aGwu6Re50GKMFhsRKuFKeby+TLCMQSUQgRqzOiXXTmKl3FQPfKFYqlcqVap1ao05qtBAdHp9AZDD5jCb4OkEGZzIE2ZarVhgDZbfC7A7HM4XK43O45R7PUVvQmfKU-P5ygDWCEwuBBlIk4NkUJhcIRSJRaIUGJA4SAA

Repro steps

I setup my Babel & webpack config to preserve comments, specifically comments for translators.

In systems like WordPress, i18n strings & comments are directly in the source code and must be preserved.

When React Compiler optimizes a component, all comments will be stripped.

When React Compiler is disabled, e.g. with use no memo or when it encounters a disabled lint rule, then the component won't be touched at all and comments are preserved as expected.

Unfortunately the Playground link doesn't show this, as it seems to strip all comments no matter what.

How often does this bug happen?

Every time

What version of React are you using?

17

What version of React Compiler are you using?

0.0.0-experimental-34d04b6-20241024

josephsavona commented 1 month ago

Thanks for posting. I can imagine it’s frustrating to have spent a lot of effort making your translations work via comments and to have the compiler break this.

The short answer is: we can't preserve comments in a way that works for all the schemes people have for them. This is because comments sort of visually correspond to a part of the program, but this is not precise enough for tooling. If you've ever seen a comment get moved from something like prettier, this is why.

To make this concrete, consider a slightly tweaked snippet from your example:

// translators: %s is username
const element = <div>{__s(“...%s...”)}</div>;

A human knows that this comment is really referring to the __s("...") call expression. But JavaScript tooling just sees that this comment precedes the variable declaration statement.

Because comments have no semantic meaning in JavaScript -- they do not affect runtime behavior of the program -- it's perfectly fine for a tool to rewrite this code as follows:

const t0 = __s(“...%s...”);
// translators: %s is username
const element = <div>{t0}</div>;

As you can see, it isn't enough to preserve comments. This particular comment scheme is meant to apply comment information to strings inside calls. So you might say, just move the comment up one line! But someone else might have a custom comment scheme that is meant to describe variable declarations, and then it would break if we moved the comment up! . There's literally no way for an arbitrary tool to know how to "correctly" preserve comments while rewriting code.

This is why JavaScript -- like effectively all modern languages -- makes comments semantically meaningless. This makes it safe to not preserve comments, or to make a best-effort. Many languages also typically carve out a safe space for semantic annotations. Rust has # syntax, JS has "use strict" style string directives as an unofficial pattern, etc.

So I hope this helps to put the existing behavior in context. We understand that it would be convenient if the compiler preserved comments, but it is impossible to do so in a general purpose way that everyone would be happy with. There isn't even a good-enough heuristic. As such, we do not and will not support preserving comments.

In contrast, type annotations provide semantic information in a structured form. We currently preserve some type annotations, and would consider supporting preserving them fully.

We would recommend exploring i18n approaches that do not depend on comments, and instead put the information about the meaning of the various interpolations ("%s") into the program itself. There are a number of popular libraries for this.

swissspidy commented 1 month ago

Thanks for the thorough response!

I feared as much. While I agree that in a JS world comments are not the best way for such annotations, changing the i18n approach is unfortunately easier said than done. I'm sure you can understand that doing so in an ecosystem as big as WordPress is an enormous undertaking. The React-based Gutenberg editor is currently exploring using React Compiler, and this is a regression from an i18n perspective. While we can probably find a workaround there, thousands of extensions (plugins) will have to do the same.

Would you be open to consider some middleground like preserving specific types of comments (e.g. /*! preserve me */) or if they are within a function call itself (e.g.__( / translators: %s: username / 'Hello %s' )` or something?

tyxla commented 1 month ago

I second @swissspidy here. Finding a workaround is not a challenge at all, but propagating it through hundreds of thousands of third-party projects (many of which don't even use React or will not adopt React 19 or the React Compiler) with as many third-party developers accumulating over the past ~20 years is doomed to failure.

An ideal solution would be to allow the Babel plugin to provide an environment configuration variable that enables skipping some comments. There's already a precedent for that with the @enableMemoizationComments flag:

https://github.com/facebook/react/blob/fe04dbcbc4185d7c9d7afebbe18589d2b681a88c/compiler/packages/babel-plugin-react-compiler/src/HIR/Environment.ts#L458C3-L458C28

Would it be feasible to consider providing a similar flag that lets us declare an array of strings or regexes, and the React Compiler would skip stripping comments that have a match?

guillaumebrunerie commented 1 month ago

How are these special comments used? Is there a tool that automatically extracts them into some more usable format for translators? If so, can’t this tool be run before the React compiler instead? Surely there are a bunch of other build steps (like minifiers and bundlers) that already remove comments.

josephsavona commented 1 month ago

An ideal solution would be to allow the Babel plugin to provide an environment configuration variable that enables skipping some comments. There's already a precedent for that with the @enableMemoizationComments flag

Just a quick note that this flag isn’t skipping comments, it tells the compiler to synthesize comments describing what it’s memorizing. You can see this for yourself by adding // @enableMemoizationComments as the first line in the playground.

The compiler transforms the Babel AST into our own internal representation (called HIR), runs an extensive number of passes, and then completely rebuilds the AST. Per my comment above, it is impossible for us to preserve comments in this process in a way that would work with every scheme you could come up with for comments. That is why the spec specifically makes comments semantically meaningless: otherwise it’s just crazy town with everyone inventing different incompatible comment schemes and tooling literally can’t touch your code.

If so, can’t this tool be run before the React compiler instead?

Yes! This is what I would explore.

swissspidy commented 3 weeks ago

Apologies for the late response.

How are these special comments used? Is there a tool that automatically extracts them into some more usable format for translators?

Yes, exactly, it's a string extraction tool that then transfers the found strings to a translation platform or in a gettext POT file.

If so, can’t this tool be run before the React compiler instead? Surely there are a bunch of other build steps (like minifiers and bundlers) that already remove comments.

We have set up our existing tooling (babel, webpack) so that translator comments are preserved. That works quite well so far.

In theory extracting the string from the source files (or finding any other solution) would be better for sure, but this is not feasible for the WordPress ecosystem, as pointed out by @tyxla. For WordPress core itself yes, there might be alternative approaches we could take in the short to medium term, as we are in full control there. But for the hundreds of thousands of plugins out there that's nearly impossible. And having two separate implementations just causes fragmentation.

nathanmarks commented 1 week ago

@josephsavona Is this why I'm running into some weird issues where sometimes istanbul is not instrumenting the code properly? I'm seeing statements, functions, etc just not even getting recognized full stop.

I spent a few hours diving into it. I first noticed that the statement/function/branch info instanbul injects into the file is completely missing things, (along with source location not being correct for some that did make it).

It looks like when the babel instanbul plugin runs, node visitors are simply not getting triggered for some things.

For example, given this super simple example:

export default function useSomething() {
  const isMobile = useIsMobile()

  return isMobile
}

istanbul is expecting these visitors (1, 2) to get invoked, and they never do when the react compiler babel plugin runs first. I also noticed that the AST is missing stuff like source location info for nodes etc.

Out of the box, jest appends istanbul to the rest of the babel plugins.

josephsavona commented 1 week ago

@swissspidy thanks for the extra context.

But for the hundreds of thousands of plugins out there that's nearly impossible. And having two separate implementations just causes fragmentation.

I think this gets at the crux of the issue. Wordpress ended up with a translation approach that uses comments in a way that is incompatible with the JavaScript specification. The main two options are for Wordpress to do the work to migrate to an approach that is compatible with the JS standard (at which point lots of other tooling considerations probably get easier too), or React Compiler could attempt to support Wordpress' bespoke system. Given the complexity and challenges of supporting this, that i've described above, we are unable to take on implementing this and also unable to commit to supporting a community contribution.

We understand this may be challenging, but we really would recommend evaluating what it would take to move to an approach that uses code rather than comments to encode critical information. We have some experience with designing translations (Meta developed fbt for example) and would be happy to discuss and provide guidance.

josephsavona commented 1 week ago

Is this why I'm running into some weird issues where sometimes istanbul is not instrumenting the code properly? I'm seeing statements, functions, etc just not even getting recognized full stop.

@nathanmarks Likely yes. We go to considerable effort to preserve source locations throughout our compilation process to facilitate debugging etc. However, we know that there are some gaps. The playground actually has a tab to visualize the source maps, and we are definitely open to PRs to improve the precision of our source locations to make source maps more accurate.

nathanmarks commented 6 days ago

@josephsavona I was surprised by this minimal case because the compiler isn't even adding anything.

What's the reason for the node visitors not even getting triggered in the istanbul code I linked above? is this because of information missing from the AST after the compiler runs? I didn't go much deeper but at a glance comparing outputs I did notice that the AST console.log output was missing some things that you normally see like the class names next to the objects etc.

josephsavona commented 6 days ago

@nathanmarks I don't know exactly how Istanbul works, but my guess is that it is using source location information. If we're missing some source span information, that could be throwing off Istanbul's evaluation of which code got executed or not.

nathanmarks commented 6 days ago

@nathanmarks I don't know exactly how Istanbul works, but my guess is that it is using source location information. If we're missing some source span information, that could be throwing off Istanbul's evaluation of which code got executed or not.

You're bang on -- that's exactly the issue, they're ignoring node visits if location information is missing: https://github.com/istanbuljs-archived-repos/istanbul-lib-instrument/blob/master/src/visitor.js#L37

And we're missing the path.node.loc off these.

What's the correct approach here... running istanbul before, or fixing the source location issues and running after the compiler? Or not running compiler at all on specs (slightly reluctant on this one, as we've caught issues in specs due to unwanted memoization, and want to test the code as close to how it behaves in prod as possible)?

josephsavona commented 6 days ago

The fix for Istanbul is to add missing source locations. We’re open to PRs that either add them or add fixtures demonstrating code where locations are missing.

What I would do for the latter is add a feature flag on Environment that enables a source locations validator. This would run after codegen and errors for any nodes that are missing a location. Then you could write fixtures which enable this flag and will fail (expected w an “error.” prefix) bc locations are missing.

nathanmarks commented 4 days ago

Thanks for the pointers! Let me have a look and see what I can do.

In parallel: correctness aside, as a temporary workaround on our app do you see any major issues with running istanbul first (otherwise we'll likely have to walk back our compiler rollout, as we're seeing devs get blocked on CI due to test suites suddenly reporting wonky coverage after unassuming changes). Everything seems to be fine on my side WRT the test suite behaving as expected.

facebook / react