dgp1130 / rules_prerender

A Bazel rule set for prerending HTML pages.
14 stars 0 forks source link

Direct Routing Support #76

Open dgp1130 opened 11 months ago

dgp1130 commented 11 months ago

So I was originally expecting that @rules_prerender wouldn't really need a routing solution because it would be generic enough to support any kind of site routing. The export default entry broad is powerful enough to mostly do whatever users could want to do. However, while working on the docs site, I've changed my mind for a couple reasons.

  1. If a page attempts to render a navigation UI which lists all the pages, then you need to know all page URLs up front before you can render a single page. This logic gets almost circular and it can be tricky to manage.
  2. It is very useful for a rendered page to know where it lives in the router hierarchy and requires a lot of duplicated information if it is not tightly integrated into the route configuration.

I'm working on a proof-of-concept directly in the docs site, but I think upstreaming a more comprehensive implementation would be very useful. Currently this looks something like:

import { bootstrap } from 'rules_prerender';

export default bootstrap([
  {
    label: 'Home',
    path: '/',
    render: (currentRoute, routes) => renderToHtml(<div>Hello, World!</div>),
  },
  {
    label: 'Tutorials',
    path: '/tutorials',
    // No render function, no page is created.
    children: [
      {
        label: 'Getting Started',
        path: 'getting-started',
        render: (currentRoute, routes) => renderToHtml(<MyNav currentRoute={currentRoute} routes={routes} />),
      },
    ],
  },
]);

The exact shape of this is TBD, but some requirements to consider based on this use case:

  1. A means of hiding routes from the navigation (privacy, not found).
  2. A means of generating parameterized routes like /products/:id.
  3. Application-specific data attached to individual routes in a type-safe manner.
  4. Two-way pointers to navigate the route tree from the current route.
  5. Route composition between different targets.
  6. Some means of discovering routes automatically? Perhaps through file-based mechanisms?
dgp1130 commented 10 months ago

One other slightly crazy idea I had was whether or not route rendering could be parallelized and cached at the Bazel layer.

It's relatively easy to parallelize async render calls of routes. I say "relatively", it's actually a more complicated than you might think with generators. But ultimately if you have one prerender_pages target which renders all the routes you're still gonna wait a long time for apps with a large number of routes.

Instead, I was imagining "sharding" the prerender_pages target, just like test sharding. Where prerender_pages could generate multiple prerender_resources actions, each one would invoke the same binary with some arg which tells it to invoke the render function of only a subset of the routes. Then all the outputs can be merged together in a web_resources target. This could be done just by dividing the number of routes by the number of shards, or could be done by route paths such as:

prerender_pages(
    name = "site",
    # ...
    route_shards = [
        [ "/blog/**" ], # Shard 1 renders only the blog posts.
        [ "/help/**" ], # Shard 2 renders only the help pages.
        [ "/**" ], # Shard 3 renders everything else.
    ],
)

This would probably need to be in a new router-aware rule like prerender_routes instead of prerender_pages. While this parallelizes at the process level, it doesn't help much with caching because each shard needs to run the same binary. So if you edit the rendering of a help page, you still need to render all the blog posts. I suspect this could be improved if we moved forward with file-based routing.

If we instead define the API as:

prerender_routes(
    name = "site",
    # ...
    routes = [
        "//path/to/site/www:home",
        "//path/to/site/www:about",
        "//path/to/site/www/blog",
        "//path/to/site/www/help",
    ],
)

Then if each routes target exposed some standard convention, such as a *.route.ts file with a default export of a router config:

export default {
    label: 'Home',
    render: () => { /* ... */ },
};

Then we could actually make each sharded action only depend on the files needed to render! The shard which renders the home page doesn't need to have a dependency on the :about target at all and can be cached.

One challenge with this is the requirement:

If a page attempts to render a navigation UI which lists all the pages, then you need to know all page URLs up front before you can render a single page.

This does mean that the home page still needs to know that about exists, even if it isn't rendered. Either we need to design the *.route.ts convention to not depend on the rendered component or we'd need to statically analyze the route to extract this information without building and running it. Static analysis might also be tricky given that we won't be able to transpile the route to JavaScript without building all its dependencies. We'd need to either:

  1. Parse TypeScript source directly.
    • Might not be TypeScript, in which case this feature falls over.
    • Supporting just TS and JS might be good enough though.
    • Requires an extra dependency edge on the TS source, which might be awkward to express in Starlark.
  2. Perform single-file transpilation of TS to JS and ignore errors, then statically analyze the resulting JavaScript.
    • This could change semantics if the file is not compatible with single-file transpilation (ex. enums).
    • Still dependent on just TS, is this much better than parsing TS source code?
  3. Build the route and its dependencies.
    • Could be a large number of dependencies, any change to any of them would invalidate the rendered cache.

I think it's a cool idea at least and one worth exploring after initial routing support.