asg017 / unofficial-observablehq-compiler

An unofficial compiler for Observable notebook syntax
https://www.npmjs.com/package/@alex.garcia/unofficial-observablehq-compiler
112 stars 23 forks source link

Tree Shaking #28

Open asg017 opened 3 years ago

asg017 commented 3 years ago

This has been an idea in the back of my head for a while, but this issue made me put it into better words.

Basically, with .moduleToESModule() and .module(), it would be nice to be able to specify a list of cells, and the compiled output only defines the cells needed to execute up to the specified cells. For example, if the original source is like:

a = 1; 
b = 2; 
c = a + b; 

x = 2; 
y = 4; 
z = x - y;

import {d3, chart} with {c as height} from "@d3/bar-chart"; 

Then we could specify the following to get different outputs.

c, z:

export default function define(runtime, observer) {
  const main = runtime.module();

  main.variable(observer("a")).define("a", function(){return(
1
)});
  main.variable(observer("b")).define("b", function(){return(
2
)});
  main.variable(observer("c")).define("c", ["a","b"], function(a,b){return(
a + b
)});
  main.variable(observer("x")).define("x", function(){return(
2
)});
  main.variable(observer("y")).define("y", function(){return(
4
)});
  main.variable(observer("z")).define("z", ["x","y"], function(x,y){return(
x - y
)});
  return main;
}

a, x

export default function define(runtime, observer) {
  const main = runtime.module();

  main.variable(observer("a")).define("a", function(){return(
1
)});
  main.variable(observer("x")).define("x", function(){return(
2
)});
  return main;
}

chart

import define1 from "https://api.observablehq.com/@d3/bar-chart.js?v=3";

export default function define(runtime, observer) {
  const main = runtime.module();

  main.variable(observer("a")).define("a", function(){return(
1
)});
  main.variable(observer("b")).define("b", function(){return(
2
)});
  main.variable(observer("c")).define("c", ["a","b"], function(a,b){return(
a + b
)});
  const child1 = runtime.module(define1).derive([{"name":"c","alias":"height"}], main);
  main.import("d3", "d3", child1);
  main.import("chart", "chart", child1);
  return main;
}

To do it, we would probably need to create the cell's DAG and determine which cells to include/exclude. Also to remove unnecessary import statements (while being careful with edge import with cases), and maybe unnecessary FileAttachments too, but that may be harder...

a10k commented 3 years ago

When the ES module without tree shaking is imported to the runtime, my understanding is it will not execute the upstream cells if they are not referenced, so the primary objective of this function is to optimize the output code size alone for the compile function?

There was only a couple of times I wanted this feature, especially when I was importing the old inputs bazaar notebook on several of my notebooks, usually for just one dropdown or slider, but when I download the tar package on observable it used to download the entire notebook and a huge gif image just for few lines of code, I wished this feature existed on the export functionality there to make the downloads smaller. (but then when I use these in a react component, I could still easily swap a dropdown to select etc., without having to go back edit the notebook, re download etc., so I was fine with the extra lines in my bundle)

Few other ideas to consider, some times I end up adding simple non-observable code in several of my cells, and in the es module, it would be great if we can export these (named) cells as direct js variables, so I can use the es module without observable runtime too, why? In observable I keep a live documentation with several md cells for info, sample data, example outputs, tables and few bells and whistles and on export I can simple use the js code, I think this is a minority case, as most of the d3 examples depend on the observable runtime, but wanted to put it here for discussion. I think this will then help other bundlers do better tree shaking when using those plain js exports...

import define1 from "https://api.observablehq.com/@d3/bar-chart.js?v=3";

export function a() {
  return 1;
}

export function b() {
  return 2;
}

export function c(a, b) {
  return a + b;
}

export default function define(runtime, observer) {
  const main = runtime.module();

  main.variable(observer("a")).define("a", a);
  main.variable(observer("b")).define("a", b);
  main.variable(observer("c")).define("c", ["a", "b"], c);
  const child1 = runtime
    .module(define1)
    .derive([{ name: "c", alias: "height" }], main);
  main.import("d3", "d3", child1);
  main.import("chart", "chart", child1);
  return main;
}
asg017 commented 3 years ago

@a10k yup, tree shaking in this case would be purely to build smaller bundles! The runtime already doesn't run variables that aren't referenced, so this would just effect bundle size more than anything. Also, one problem with our/observable's current compiler is that static imports will cause your browser to fetch all downstream JS files that are imported, even if they're not used. So if your notebook looks like:

cell = 1
import {unused} from "https://example.com/other.js"

With compiled output:

import define1 from "https://example.com/other.js";

export default function define(runtime, observer) {
  const main = runtime.module();
  main.variable(observer("cell")).define("cell", function(){return(
1
  )});
  const child1 = runtime.module(define1);
  main.import("unused", "unused", child1);
  return main;
}

And you only import cell from that module, your browser will still import the https://example.com/other.js file, even though you don't use unused. Then, if https://example.com/other.js import more JS files, like

// https://example.com/other.js
import define1 from "https://example.com/more1.js";
import define2 from "https://example.com/more2.js";
import define3 from "https://example.com/more3.js";

Then those 3 JS files will also be fetched, along with any other JS dependencies those files have as well. So, if we can just tree shake the unneeded https://example.com/other.js out to begin with, then this won't happen.

Also: I love the idea of exporting cell functions in the compiled ES modules! That would be the cleanest way to re-use observable code in plain JS environments without needing the runtime. Will definitely take a look to see how we can incorporate that into our compiler, maybe with a new exportCellDefintions:false parameter on compile.module