dgp1130 / rules_prerender

A Bazel rule set for prerending HTML pages.
13 stars 0 forks source link

`includeScript()` strict deps #2

Open dgp1130 opened 3 years ago

dgp1130 commented 3 years ago

Currently, the includeScript() function does not validate that the included file is present in the scripts attribute of the associated prerender_component() rule. If you include a script which you don't depend on, it could throw a file not found error at the bundling step. Worse, if another component depends on that file, it could pass the build until that component changes in the future and removes the script. Simply put, prerender_component() should verify that all includeScript() statements have files which are direct dependencies of the scripts attribute.

How to do this could be quite tricky. Since we need to throw an error in a ts_library() component, the only want to do that is to make this state a TypeScript compile error. I can think of two ways to do that:

  1. Add a plugin to check strict deps. This is how ts_library() solves the problem today and we could do something similar. The downside here is that we need to somehow add this plugin. Users may have their own version of ts_library() which applies some of their own plugins, and we need to be compatible with that. Considering that right now we don't allow custom ts_library() implementations, this may not be too bad, but it is something we will have to deal with.
  2. Generate the includeScript() implementation with a more specific type. Instead of depending on @npm//rules_prerender to provide the includeScript() function, we could instead have prerender_component() read the scripts attribute and generate a ts_library() which provides includeScript(). This generated version could then accept as input a union of all the valid strict deps, rather than a simple string. This dodges the plugin problem but adds a lot of its own complexity.

We also have to consider what happens if a user calls includeScript() with a variable that can't be resolved to a specific string. There shouldn't be much use for that, but I'm sure users will try it. This should probably be an error in order to improve strictness, but we may need some means of ignoring such cases.

dgp1130 commented 3 years ago

I took a quick look at this in the ref/plugin tag. Unfortunately it seems a bit more complicated than I initially expected. The TypeScript compiler doesn't appear to really have a plugin API. Rather it just has a language service plugin API. Language service plugins are used by editors but are not invoked by the standard tsc process, meaning a bazel build would not trigger the plugin (see https://github.com/microsoft/TypeScript/issues/16607).

ts_library() gets around this via tsc_wrapped, a wrapped tsc binary with some extra Bazel logic tacked on. This is also how the Angular and Lit plugins get invoked. So they aren't really plugins, just dependencies of the Bazel-specific tsc implementation.

We could do the same thing here, make our own tsc_wrapped binary which invokes special rules_prerender logic that walks the AST and errors out on strict deps issues. Then we just use that as the compiler to ts_library() / ts_project() and it should work for the user. The biggest problem with this is that Bazel's tsc_wrapped is not itself wrap-able. It is an entire nodejs_binary(), unlike @npm//typescript which exports a Node library that can be easily called in any tool. So if we made our own tsc_wrapped, we would be throwing away all the work that goes into Bazel's tsc_wrapped. That includes file loading, normal strict deps, performance improvements like Bazel worker support, and probably a lot more. We could fork it and add our own plugin, but that means we need to maintain the fork and update it over time with rules_nodejs. Using a custom tsc_wrapped also restricts users to that one implementation. If they had another custom plugin they wanted to use, they wouldn't be able to use it in addition to rules_prerender.

I posted on the Bazel slack channel asking about this to see if I'm missing anything. I couldn't find any issues or references to custom plugins in the rules_nodejs repo, so I haven't seen any precedent beyond tsc_wrapped. Hopefully someone points out a better approach, because this one doesn't seem like it would be very fun.

dgp1130 commented 3 years ago

Had a brief discussion about this on Slack. Unfortunately there's no great answer here atm. TypeScript does not actually support compiler plugins, as I originally thought. ts_library() and ts_project() don't have any special support for this either, nor would they be inclined to add it for a non-standard TypeScript plugin model. This leaves us with two possible approaches:

First, we could fork tsc_wrapped and add a custom transformation/validation step for strict dependencies. This is basically how Angular and Lit Element have handled this problem, but they both have direct support in rules_nodejs. Without that luxury, we have to fork tsc_wrapped to make this change.

I'm not sure how hard it is to fork tsc_wrapped, but the changes to it would likely be pretty minimal. Most of the work could be included as a separate module and then a small patch could import and invoke the module without too much else. This would hopefully be reasonably resistant to merge conflicts. The bigger challenge would be actually forking tsc_wrapped itself as I'm not sure how self-contained it is.

The second option is to generate a stricter includeScript() and includeStyle() type definition. Currently these functions accept any string type, but we can leverage TypeScript's string literal and union types to be more specific. In a prerender_component() target, we could read all files in the scripts attribute, and then generate a .d.ts definition that explicitly lists all the allowed scripts that can be included. If the user had a typo or did not include a dependency, this would be a compile-time error.

declare export function includeScript(script: 'rules_prerender/examples/strict_deps/foo.js' | 'rules_prerender/examples/strict_deps/bar.js'): string;

This has the benefit of not requiring a custom compiler. However the generated type definitions can't know where a file is being included from, so we can't apply module resolution logic here, meaning relative paths are not possible.

Comparing the two approaches:

Before making a decision here I want to resolve those two major unknowns for each. As of right now, I'm leaning a bit more the first option just because it feels a bit less hacky. With a little more information we can hopefully identify a path forward here.

dgp1130 commented 3 years ago

I did a quick prototype of option 2 to validate the approach. Apparently, we can get a list of direct source files in a dependent ts_library() rule via JsModuleInfo.direct_sources and this works as expected. However, there are other problems I had not realized.

I attempted to generate a simple TypeScript source which re-exported rules_prerender but modified the includeScript() type to be specific to the scripts attribute given. The problem with this is that there can only be one rules_prerender module in the compilation, and by doing this trick I now have two which conflict with each other. Even if I could fix the module name issue, all the various prerender components are included in the same Node binary, meaning they need a single definition of rules_prerender and the includeScript() type. However, part of the requirements for strict deps is that each component is restricted separately, one component may have access to ./foo.js while another has access to ./bar.js. There is no way to represent this difference when given a single rules_prerender definition.

I did try another strategy which leaves the rules_prerender module alone, but instead include another dependency which augments the type like so:

declare module 'rules_prerender' {
    export function includeScript(script: 'wksp/foo/bar/baz.js' | 'wksp/hello/world.js'): string;
}

This actually works around the one definition issue because it does not create a new rules_prerender module, but rather augments the existing type. Not only that, but only the srcs in a given ts_library() are properly type checked, so they can all have their own declare module type which are distinct between them, allowing each to type check with their own strict deps!

There are two major issues with this strategy however. First is that there is no way to re-export all the existing rules_prerender types, as the declare module 'rules_prerender' appears to replace the entire module. We would have to maintain a separate definition of the types of rules_prerender, separate from its implementation. This would be annoying, and not easily automatable, but ultimately solvable. The second problem is more of an issue, which is that this deletes the value references exported by the real rules_prerender implementation. This means that using includeScript() results in:

ERROR: /home/dparker/Source/rules_prerender/examples/scripts/transitive/BUILD.bazel:4:1: Couldn't build file examples/scripts/transitive/transitive.js: Compiling TypeScript (devmode) //examples/scripts/transitive:transitive_prerender failed (Exit 1)
examples/scripts/transitive/transitive.ts:10:11 - error TS2693: 'includeScript' only refers to a type, but is being used as a value here.

10         ${includeScript('rules_prerender/examples/scripts/transitive/transitive_script')}
             ~~~~~~~~~~~~~

I don't see a viable solution to the removed value reference. I think the core issue here is that modules are not really intended to be augmented in this fashion. Technically, TypeScript does not support declaration merging of modules, however you can use them to extend interface types. There might be some magic incantation of TypeScript syntax that gets something kind of what I want, but this is a pretty unique enough problem that I'm comfortable saying this just isn't supported by TypeScript.

The takeaway from all this is that I don't think the second approach of using generated type definitions is viable. I don't see a way this could work with TypeScript in its current state.

Prototype: https://github.com/dgp1130/rules_prerender/commit/c3bb7658ada72e5cf88465dda18b6f953252d77a

That means that we either need to fork tsc_wrapped or do nothing and wait for TypeScript and rules_nodejs to support proper compiler plugins. I want to do some more investigation of tsc_wrapped and verify the fork option before folding on the implementation here.

dgp1130 commented 3 years ago

I took a pass at forking tsc_wrapped by copying all the sources of rules_typescript into third_party/rules_typescript/. The tricky part is getting Bazel to resolve paths correctly. I added a local_repository() rule to the root WORKSPACE file for rules_prerender to link to rules_typescript. This sets up the external workspace, but immediately runs into package loading errors.

The first issue is loading other packages which use paths like @build_bazel_rules_nodejs//packages/jasmine:index.bzl. This works in the rules_nodejs repo, but fails here because the packaged version of rules_nodejs does not include Jasmine (and likely would not have a package there anyways). Rewriting such paths to @npm//@bazel/jasmine:index.bzl and doing the same for other rules_nodejs packages.

After that, I get errors about a missing @npm//tsickle dependency. I don't have that in rules_prerender, so the issue is that Bazel isn't installing the rules_typescript package.json file. Since rules_prerender already uses the name @npm, I think it doesn't install anything from rules_typescript dependencies. I renamed the yarn_install() workspace rule in rules_typescript() to npm_rules_typescript and updated dependencies accordingly. I believe this is enough to get NPM dependencies to resolve.

The next issue is that @io_bazel_rules_go does not resolve. I definitely don't use Go in rules_prerender, so this is another dependency I will need to add. After digging around rules_nodejs, I eventually found rules_typescript_dev_dependencies() as well as some additional Golang setup (here). Including that to the root WORKSPACE file solved the Go issues.

Finally, I was missing files from node_modules/ because they were not being installed. A simple yarn execution on the vendored rules_typescript directory installed the required dependencies and I was finally able to bazel build @build_bazel_rules_typescript//internal:tsc_wrapped_bin!

It's a bit awkward for a number of reasons and I would like to make this simpler in a few ways:

  1. We should have a script to actually import the module and perform the necessary replacements rather than having to remember how to do this every time we need to update.
  2. rules_typescript uses Yarn, but rules_prerender uses the NPM CLI, which is awkward. Might be worth migrating to Yarn just for consistency. Also we should investigate further to see if we can condense the two installs into a single one somehow. I think that would require merging the package.json dependencies, so maybe its impractical, but worth investigating.
  3. It's unfortunate that we need to vendor all these sources (and lose history). I just realized that http_archive() has some non-trivial patching functionality. It might be worth investigating if we can do the same thing with that rule to avoid vendoring all this code. Ultimately we only need to patch a simple import and Bazel dependency, most of the work can probably be done in rules_prerender.

Ultimately the fork wasn't that difficult, just need to polish everything and see if we can avoid vendoring it. Currently working in the plugin branch.

dgp1130 commented 3 years ago

Tried doing the same thing but using http_archive() patching. This unfortunately ran into a variety of issues.

I first used strip_prefix to just pull rules_typescript and use that as a separate workspace. However, the yarn_install() rule would fail due to version mismatch between @bazel/jasmine and @build_bazel_rules_nodejs. It seems that the rules_typescript package.json has not been updated in some time, and the dependencies are out of date. This is ok within the rules_nodejs repo because it builds each relevant @bazel/* package from HEAD and other dependencies come from the root package.json. We also can't just update these values, because no version would accurately reflect the HEAD build. Setting the correct version would need to be part of the release process, but that means rules_nodejs either needs to consistently update and check in new version numbers, or make a "release" of the entire source code with such versions included, so I would no longer be actually building from source. The takeaway, is that I don't think I can depend on the source of just rules_typescript, I need to depend on the entirety of rules_nodejs from source.

After accepting this, I found a from_source example which seems mostly what I want. I copied that as best I could but quickly encountered an issue with missing NPM dependencies. It seems that loading another workspace via http_archive() does not run its WORKSPACE.bazel file, so I have to manually trigger yarn_install() to load the relevant NPM dependencies. The tricky part is that I now have two package.json files, one for rules_prerender and another for rules_nodejs. This is supported and it seems that all you have to do is give each npm_install() / yarn_install() command a different workspace name with its own managed directory. I did have a bit of trouble with this, and I suspect that using package_json = @other_wksp//:package.json has a bug in it, as some packages would not resolve internal //foo:bar references correctly (resolved to @npm//foo:bar, rather than @npm//some_pkg/foo:bar. Eventually this problem went away for reasons I don't fully understand. Ultimately, the final issue I landed on is where BUILD files within @npm//... seem to resolve correctly, but other files are missing.

$ bazel build @build_bazel_rules_typescript//internal:tsc_wrapped_bin
INFO: Invocation ID: 241a49dd-35d7-4814-ac84-a39828ca847f
INFO: Analyzed target @build_bazel_rules_typescript//internal:tsc_wrapped_bin (49 packages loaded, 1024 targets configured).
INFO: Found 1 target...
ERROR: /home/dparker/.cache/bazel/_bazel_dparker/2b371dbae62d236f30054bf9ea1b9fc9/external/build_bazel_rules_typescript/internal/BUILD.bazel:49:4: @build_bazel_rules_typescript//internal:tsc_wrapped: missing input file 'external/npm_rules_nodejs/node_modules/protobufjs/LICENSE', owner: '@npm_rules_nodejs//:node_modules/protobufjs/LICENSE'
Target @build_bazel_rules_typescript//internal:tsc_wrapped_bin failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/dparker/.cache/bazel/_bazel_dparker/2b371dbae62d236f30054bf9ea1b9fc9/external/build_bazel_rules_typescript/internal/BUILD.bazel:73:14 334 input file(s) do not exist
INFO: Elapsed time: 4.470s, Critical Path: 0.06s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

I suspect this is because I haven't run yarn on the package.json file from rules_nodejs, so no actual source files are populated. I tried copying the package.json from rules_nodejs to the relevant directory that contains its node_modules/ and then running yarn. Unfortunately this encounters a few errors due to local tool references in package.json. Deleting those and just pushing through, I was able to get a successful yarn execution, but the files are still missing from Bazel perspective and I don't understand why.

I think the core problem is that I'm building rules_nodejs from source but failing to install its NPM dependencies correctly. The from_source example doesn't have this problem because it ignores the rules_typescript/package.json file since the rules_nodejs/package.json file is effectively a superset. I might have to file an issue / ask on Slack to the rules_nodejs community to see if there's something I'm missing here.

Fortunately, the dev experience with bazel build and patching changes works a lot better with this approach than the manually forked version. I would love to get this to work somehow, just a lot of extra complexity that's getting pulled in.

dgp1130 commented 3 years ago

Noticed this in the Bazel Slack today: https://bazelbuild.github.io/rules_nodejs/changing-rules.html

Might be worth exploring if we can patch @bazel/typescript directly, rather than switching to build from source.

dgp1130 commented 3 years ago

I started looking into patch-package as suggested by the previous comment. I was able to patch the local @bazel/typescript package. It imports a new package I made locally called rules_prerender_tsc_plugin which adds a simple diagnostic plugin that fails the build with a standard error. I then made a new target at @npm//@bazel/typescript/internal:tsc_wrapped_with_rules_prerender_plugin which adds the dependency on the new rules_prerender_tsc_plugin package, and then added it as a compiler to the ts_library() generated by prerender_component(). This is able to throw the error as expected, and shows that I have effectively patched the TypeScript compiler with a custom plugin.

A couple interesting aspects:

How should we publish this :tsc_wrapped_with_rules_prerender_plugin binary? Currently it just builds inside the rules_prerender repo for local development, but it includes a hard dependency on the typescript version used by rules_prerender. However, typescript is intended as a peer dep and so publishing this would create a new typescript version in the user's repository, the dev dependency they have installed (used by their ts_library() targets), and the dev dependency from rules_prerender (used by prerender_component() targets). It also means that updating the rules_prerender dependency on typescript is actually a breaking change to users, particular since TypeScript doesn't follow semver. I don't want to be pushing out breaking changes every 6 months, and I don't want to force users onto a particular TypeScript version.

Alternatively, I'm thinking I could patch the user's @bazel/typescript package instead of the rules_prender dev dependency @bazel/typescript package. It would work something like this:

  1. rules_prerender publishes a patch-compiler script in the NPM package.
  2. Users add "postinstall": "patch-compiler" to their package.json.
  3. After npm install in user workspaces, the patch-compiler script will patch their @bazel/typescript install to include the prerender plugin (only if a particular option in tsconfig.json is set).
  4. prerender_component() sets the relevant tsconfig.json option to enable the plugin.

This requires one extra setup step from users (the postinstall hook) but gives them full control over the typescript and @bazel/typescript versions they want to use. It does mean that rules_prerender needs to be compatible with many different versions of typescript and is somewhat dependent on the internal structure of @bazel/typescript. This doesn't follow public API contracts and could break at any time. The patch to @bazel/typescript should be fairly minimal, since it will just be:

const { PrerenderPlugin } = require('rules_prerender_tsc_plugin');
// ...
diagnosticPlugins.add(new PrerenderPlugin());

And anytime typescript includes a breaking change it also has the potential to break this plugin. I think both of these could be addressed with a more precise peer dep on both packages if necessary to prevent users from depending on an untested version of either of these libraries. It is a bit awkward to patch the user's node_modules/, but I think the end result is better than trying to ship a different version of typescript with the rules_prerender plugin included.

I also discovered that I needed link_workspace_root = True on the patched :tsc_wrapped_with_rules_prerender_plugin nodejs_binary(). This is because of https://github.com/bazelbuild/rules_nodejs/issues/1121#issuecomment-530567676, where even relative imports in a TypeScript file get converted to workspace relative imports. link_workspace_root solves the problem locally, but I'm not sure if it will work within a user workspace. We may need to pre-bundle the rules_prerender_tsc_plugin package to avoid such imports and drop the need for link_workspace_root.

I definitely think going with patch-package is a lot easier than the other alternatives. Not to mention this strategy has the capability to patch the user's @bazel/typescript package rather than shipping a special compiler from rules_prerender with a specific typescript version, which I think is the better way forward. I'll need to play around a little more to see if I can get this working with a user project.

dgp1130 commented 3 years ago

Snapshot of current progress: https://github.com/dgp1130/rules_prerender/commit/27eb205a65351be4cac3d16b33ff49dddf9dcc50.

dgp1130 commented 3 years ago

It's also worth considering how this patching strategy would work with Yarn PnP (or the NPM equivalent whose name I'm forgetting). Writing to node_modules/ is a bit sketching for a few reasons, but it definitely wouldn't work well in these environments.

As a similar strategy, I'm wonder if we could require.resolve() the tsc_wrapped.js file at runtime, read its contents, patch them, write to some temporary directory, and then execute the modified version within a similar context. I'm not sure how feasible that is but it could allow for both:

  1. Supporting Yarn PnP / not writing to node_modules/.
  2. Using the @bazel/typescript / typescript peer deps installed by the user rather than provided by rules_prerender.
dgp1130 commented 2 years ago

This has been in the back of my mind for a while (honestly forgot I put this much effort into the investigation). Two ideas I want to write down quickly before I forget:

First, we don't necessarily need a real compiler plugin. TypeScript does support plugins, but only for editor intellisense, not actual compilations. Similarly we can generate component-specific includeScript() calls, but the editor doesn't know about them. I'm thinking we can do both of these together to make up for each other's short-comings.

prerender_component() could generate a TS source file with an includeScript() unique to that component. This would be enough to pass a bazel build, but the editor wouldn't be able to resolve the import. Separately a TS language service plugin can teach the editor how to resolve these imports to something reasonable.

The second approach is to use import.meta. We can make this a second parameter to includeScript() which states where the given import is coming from. The definition can then resolve the given path relative to the calling file.

I'm not sure how well supported that is in NodeJS or if it requires ESM. This also pushes the problem to runtime and requires more tooling to generate a strict deps map (prerender file -> client script file) which gets validated at includeScript() to make sure the dependency is valid.

I've have to explore both of these options a little more to see what the best path forward is.