adamhaile / surplus

High performance JSX web views for S.js applications
639 stars 26 forks source link

Compiler chokes on regular expression literal #87

Closed fschopp closed 5 years ago

fschopp commented 5 years ago

Given file compiler-bug.js with content

let compiler = module.require('surplus/compiler');
compiler.compile('const rquickExpr = /^(?:\\s*(<[\\w\\W]+>)[^>]*|#([\\w-]+))$/');

the following fails:

$ node compiler-bug.js
node_modules/surplus/compiler/index.js:373
            throw new Error(msg + frag);
            ^

Error: bad element name at line 0 col 28: ``<[\w\W]+>)[^>]*|#([\w-]+))$/''
    at ERR (node_modules/surplus/compiler/index.js:373:19)
    at jsxElement (node_modules/surplus/compiler/index.js:121:17)
    at program (node_modules/surplus/compiler/index.js:91:35)
    at parse (node_modules/surplus/compiler/index.js:83:16)
    at Object.compile (node_modules/surplus/compiler/index.js:1512:49)
    at Object.<anonymous> (src/spec/compiler-bug.js:2:10)
    at Module._compile (internal/modules/cjs/loader.js:701:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:712:10)
    at Module.load (internal/modules/cjs/loader.js:600:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:539:12)

The problem is that the Surplus compiler does not skip regular expression literals.

Just for background: The line is taken from the jQuery source code. The reason I ran into the issue in the first place is that parcel-plugin-surplus runs the surplus compiler also on .js assets. A reason for that could be that Parcel does not distinguish between .tsx and .ts files, so they both get transformed into a .js asset.

adamhaile commented 5 years ago

Hi @fschopp - Thanks for the report and the repro case. I think this is the same issue as reported in #45 .

The underlying issue is that regular expressions are hard to parse in JS -- they're not context free, meaning the parser needs to distinguish whether a '/' is occurring in a context where a division operator is possible or where a regular expression is possible. The surplus parser isn't sophisticated enough to handle that. Originally, this was a conscious tradeoff -- the early versions of surplus often did runtime rather than build-time compilation, and this tradeoff allowed the parser to be smaller and run faster. Now that surplus is almost exclusively build-time (except for surplus-toys), that's not so much of a concern.

This could be fixed, as I mentioned in #45, by transitioning to a smarter parser, like acorn.js. The hold-up on that is that, well, outside of this issue, the current parser is working fairly well, and the code changes would be large. I'm sort of waiting for some other issue to come along and tip the pro/con balance in favor of that work, but until then, I haven't undertaken the rewrite.

Sorry to not have an immediate answer here. More reports like this definitely tip the pro/con list. I'll keep you posted. -- Adam

adamhaile commented 5 years ago

Ah, I think I just got the significance of what you had to say about Parcel and .js assets. So in the parcel pipleine, both .ts and .tsx get converted to .js. So if parcel-plugin-surplus didn't parse .js assets, you couldn't use surplus on .tsx files with it, as they first get converted to .js. If that's right, I see the issue. Is there no way in a parcel plugin to tell what the original asset type was? I wonder if the application to .js files could occur only if the original file was .tsx. I don't use parcel, unfortunately, and know even less about its plugin api.

fschopp commented 5 years ago

Hi Adam – thanks for the thorough explanation. For some reason I did not notice #45 – indeed, my report is a duplicate. I'll close this issue.

I agree that a smarter parser would be nice to have, but the failure with Parcel is probably better addressed in parcel-plugin-surplus and/or in parcel itself. One option would be that parcel-plugin-surplus only claims responsibility for .jsx and .tsx files. For .tsx it would then also have to invoke the TypeScript compiler. Another option would be to make parcel distinguish between .js and .jsx assets, and also between .ts and .tsx. Then parcel-plugin-surplus only needs to claim responsibility for .jsx assets.

I didn't have the time for such a pull request, so for the time being my project is using a little script that invokes the surplus compiler to transform .jsx into .js and also uses package source-map to coalesce the tsc and surplus-generated source maps (i.e., a .js to .tsx source map). Parcel then only ever sees .js assets in the HTML. The downside of this workaround is, of course, that parcel no longer rebuilds automatically if I modify the source files.