bad transform of fake this param

patricklx commented 9 months ago

const { Preprocessor } = require('content-tag');

const p = new Preprocessor();

const res = p.process(`
f = function(this: Context, ...args) {
    function t(this: Context, ...args) {};
    <template></template>
}`);

console.log(res);

results to

 import { template } from "@ember/template-compiler";
f = function(this: Context, ...args) {
    function t(this1: Context, ...args1) {}
    ;
    template(``, {
        eval () {
            return eval(arguments[0]);
        }
    });
};

removing the template will make it work correctly

chancancode commented 9 months ago

(I looked into this with @zvkemp but the following conclusions are mine)

Refer to the commentary in #72 for a primer on how this works.

TL;DR My bottomline conclusion is that we are incorrectly using the hygiene code in SWC and need to do something else, probably roll our own much more narrower/targeted version

Bug in swc_ecma_transforms_base::hygiene?

Yes, I think you can say, the most direct cause for this issue is that there is a bug in swc_ecma_transforms_base::hygiene, where it incorrectly considers the TypeScript this type annotation as a regular, renameable identifier. Perhaps we can even file that bug and get it fixed. We'd then have to rebase our fork and ingest it.
They don't run the hygiene pass like this

I think part of the reason why it wasn't caught upstream by now, is that they don't run the hygiene pass this early. It seems like it generally does not expect TypeScript to be present at this stage, as it's typically used as a minifier, or at the equivalent time in dev builds but in non-mangling mode.
It's overly broad for what we need

When you run the hygiene pass on the code, it flattens out and renames all identifiers it consider to be a possible collision, causing other variables we didn't touch nor care about to be renamed. This results in a slightly awkward development experience. I have noticed things getting renamed foo1, etc, when debugging in the Chrome console, I just didn't make the connection who was doing that. It would be good to avoid that if possible, and I think it's quite unexpected when content-tag touches the other parts of the code it has no business changing.
We are using it wrong and we are not getting correct results in cases we actually care about

Consider this:
```
import { template } from "@ember/template-compilation";

export default function(template) {
 console.log(template); // the local variable
 return <template>hi</template>;
};

console.log(template); // the import
```
Here, you would expect that the hygiene code to do something like this:
```
import { template as template1 } from "@ember/template-compilation";

export default function(template) {
 console.log(template); // the local variable
 return template1(`X`, { eval() { return eval(arguments[0])} });
};

console.log(template1); // the import
```
Or perhaps:
```
import { template } from "@ember/template-compilation";

export default function(template1) {
 console.log(template1); // the local variable
 return template(`X`, { eval() { return eval(arguments[0])} });
};

console.log(template); // the import
```
But it actually does neither. Instead, you get the following incorrect code:
```
import { template } from "@ember/template-compilation";

export default function(template) {
 console.log(template); // the local variable
 return template(`X`, { eval() { return eval(arguments[0])} });
};

console.log(template); // the import
```
These are exactly the kind of cases that is hard to get right, and if the SWC hygiene code does this for us it would perhaps be worth the trouble with the other renames. But it just doesn't, and it's not really a bug, it's just that we are using it differently (incorrectly) than what it's intended for.

The way the SWC macro expansion hygiene system works is that it expects there to be user code (input source code) and synthesized code, and the "hygiene" in this case is that synthesized code exists in a parallel namespace universe that is doesn't interfere with user code and vice versa (i.e. hygienic macros). Code introduced by the macros won't accidentally shadow user code, and likewise user code won't shadow stuff form the synthesized code, so that it works reliably regardless the insertion position.

Specifically – the top_level_mark exists for precisely this reason. We are expected run the "clean" AST through the resolver before inserting any synthesized code. The resolver would tag all existing identifiers with a mark/syntax context that is derived from the top_level_mark. Any other code inserted thereafter and aren't derived from this mark are considered synthesized code. (By the way, we currently violate this assumption by calling resolve too late, which I fixed in #72.)

This is fine in the case when there is no preexisting import, and we use unique_identifier! to generate a hygienic identifier for the imported template function. However, in the case where there is a preexisting import, we are reusing the existing identifier, which is considered user code (descendant of the top_level_mark), violating the hygiene assumptions.

Why does it matter? One keep reason for the distinction between user code and synthesized code is eval(). When there is an eval() statement, SWC suppress the renaming for that scope and any of its parents. The logic is that the eval statement may need to see any possible upvars in their original name. However, SWC is free to (and will) rename any synthesized identifiers, because that is by definition non-breaking for the user eval code. (Check out https://github.com/swc-project/swc/pull/6818, which is also when top_level_mark was introduced.)

So that explains what is going on with the example I gave. We basically put the hygienic rename code in an impossible situation. Technically the example I gave isn't actually unsolvable, but you can easily construct ones that are, and the bottomline is that we violated the fundamental assumptions of their algorithm.

So, my conclusion is that I think the built-in hygiene stuff from SWC is a poor fit for what we want to do, and it both does too much and too little. I think we'll probably have to write our own visitors, loosely based on what rename/hygiene does, but also tweak it to fit our purpose. We can probably reuse some parts of those code, maybe the rename::Analyzer/rename::Scope stuff, but I'll have to play with it to know.

We tried the obvious/naive thing of just implementing a Renamer (basically what hygiene does), but the assumptions are baked in too deep and that didn't work out. The code really expect you to actually rename the identifiers, and if you try to return the same one in cases where it expects you to rename them, you'll end up being stuck in an infinite loop.

patricklx commented 9 months ago

Wouldn't it be enough to rename the identifiers that we introduced for template? Since we also control the exact swc version, we can make assumptions to what we can rename template. We are already visiting nodes, so we could also visit identifiers and check if there is a collision. If there is, we rename our identifiers. I guess it's only template? Probably not template1, but maybe _template_

chancancode commented 9 months ago

That's "more or less" what I meant by "write our own visitors, loosely based on..."

Yep, it's just all of those things, but done correctly, devil is in the details 😄

Yep, template is the only identifier we care about.

We just have to make sure we visit all the right types of identifiers to check for all the right types of scopes – blocks, functions, take into account var hoisting, eval, etc.

In the case there is no existing import, then yes we just have to pick a name that doesn't collide with anything in the places where there is a template tag and the replacement name has to be collision free in all of the intermediate scopes as well.

In the case where there is an existing import, it gets complicated real fast (but as I said, the existing SWC code also doesn't handle this correctly), especially when there are evals (which we basically guarantee there is going to be), because renaming the existing import name can break user code that relies on the template (or whatever they named it) being there in that exact name. So in the case where a <template> tag is trapped in a spot where the imported name is shadowed, we are kind of stuck, the easiest thing to do is probably to introduce a top-level alias (const template1 = tempalteOrOriginalSometimesShadowedName;).

95% of the time it's basically doing a bunch of deep analysis to discover we don't have to do anything. 50% of the challenge is to do that analysis correctly, and the other 50% of the challenge is to make sure we handle the edge cases where we do have to do something.

All said and done, we can maybe reuse 30% of the SWC code, but it's still helpful to have that code available as a guide, informing us what kind of nodes we need to care about for the purpose of the analysis, etc. All the potentially useful parts are already exported, so we don't really need to make changes in our fork. It's best to keep the code outside of the SWC tree as much as possible to make it easier to merge in upstream changes. Plus you don't really want to be messing around with changing something as core as this.

If we can find overlapping availability considering timezone differences, would also be happy to pair on it.

patricklx commented 9 months ago

You mean there could be an eval which expects the template identifier to be available which is introduced by content-tag? Or from an existing import? I think we should not touch any existing identifiers or imports. If there is an existing import, we should just leave it and we can add our own import with different local specifier?

patricklx commented 9 months ago

Mayb would be enough to rebase swc on latest:

https://github.com/swc-project/swc/issues/236#issuecomment-465011303

Edit, looks like content-tag is already using v1.3.92

ef4 commented 9 months ago

Haven’t had a chance to review in detail, but on the topic of it being unusual to leave the typescript in the output: yes, it’s clearly a less-tested use case and this is not the first bug I’ve found in that area.

But upstream swc definitely intends to support it and it’s probably a small fix to make the hygiene system treat the this parameter as a type and not a runtime identifier. That would be worth doing regardless of whether or not we also have a bug in how we’re calling it.

I can merge upstream into our swc fork to make it easier to check if the issue Patrick linked above definitely addressed our case.

On Sat, Mar 2, 2024 at 3:46 AM Patrick Pircher @.***> wrote:

Mayb would be enough to rebase swc on latest:

swc-project/swc#236 (comment) https://github.com/swc-project/swc/issues/236#issuecomment-465011303

— Reply to this email directly, view it on GitHub https://github.com/embroider-build/content-tag/issues/71#issuecomment-1974732915, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACN6MUI626HZ7QK5HNALRDYWGGVBAVCNFSM6AAAAABD7U4C56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUG4ZTEOJRGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

chancancode commented 9 months ago

@patricklx see the comment here https://github.com/embroider-build/content-tag/pull/72/files#diff-a1f71e354ebbbb94c7534364ba0b92853eb124d48da73573fc10581ed9b940d6R13-R35 and the existing tests here https://github.com/embroider-build/content-tag/blob/main/src/lib.rs#L299-L350 to get up to speed on the cases we claim/aim to support

SergeAstapov commented 3 months ago

for those who face the same issue, workaround can be found in https://github.com/cardstack/boxel/pull/1305/files#diff-3bde26507ee9a7a15f36481f9a8a61c93c006315047591bc9dab482a3b4b731dR156

embroider-build / content-tag

bad transform of fake this param #71