asg017 / unofficial-observablehq-compiler

An unofficial compiler for Observable notebook syntax
https://www.npmjs.com/package/@alex.garcia/unofficial-observablehq-compiler
115 stars 22 forks source link

Compiler v1 - Current problems, potential solutions #26

Open asg017 opened 3 years ago

asg017 commented 3 years ago

So when I first made this library, I had one use-case in mind: atom-observable. That extension was pretty minimal: given Observable javascript code, render it into an HTML page. No on-the-fly redefining, only import external notebooks from observablehq.com, it didnt even support viewof and mutable that well: just straight up refresh the page when the source code changes.

But now, given that this project is the only one of its kind, and I'm starting to see a few more projects built on top of this project, I think it's time to re-think how this library is structured. @bryangingechen put it a lot of smart work to make this compiler better, but it was built on top of my ill-defined idea of how the project should be structured. I've spent some time using this library in another project, dataflow, and I've found many issues/confusing things in this library that I think we can improve on.

1) This library is an interpreter and compiler

My original take of this library was an interpreter, NOT a compiler. I named it "compiler" because that's what Observable called their closed-source library, but the original take of this library was like so:

const compile = new Compiler();
const main = await compile.module(`a =1; b = 2; c = a + b;`);
console.log(main.value('c')); // 3

The above doesn't compile observable javascript into "real" javascript, it interprets it and executes it on the fly. This library didnt have "compiler" features until @bryangingechen added it like so:

const compile = new Compiler();
const src = compile.moduleToESModule(`a = 1; b = 2; c = a + b`);
console.log(src); // "export default function define(runtime, observer) { ..."

I think we can do a better job at distinguishing this in the API. Maybe separate Interpreter and Compiler classes that you interact with. In dataflow, I found myself using the "Interpreter" functions (.cell, .module, createCellDefinition) when building the "live editor" features, and the "Compiler" functions (.moduleToESModule) when building "export" features. I would assume other users of this library would want something similar, and an easier-to-understand API would be very helpful.

2) define/redefine doesn't help, but raw Runtime variables do

Many of the current "interpreter" functions return explicit define/redefine functions like so:

{define, redefine} = await compile.cell(`a = 20`);
redefine(main);
await main.value("a"); // 20

But I didn't find this useful at all in dataflow. I found only define useful, and instead of redefine, I would just delete the original variable and define a new one right away, and I would see no problems. What's key here we need to return runtime variable references in interpreter commands, which #23 made me realize (thanks @a10k !). With those variables, people can delete/redefine the variable directly as they please, instead of dealing with these cryptic define/redefine methods.

3) I found pre-parsing input important

Sometimes, I would pre-parse source code with @observablehq/parser to check for specific references, like FileAttachment and Secret, or to keep track of source code -> runtime variable references to rearrange when code was re-arranged. I would also do this to find import statements and import remote notebooks when interpreting (our current interpreter assumes remote modules are pre-fetched, which is also NOT good).

I think we should either a) only allow people to pass in pre-parsed source code instead of source code directly (meaning we could make @observablehq/parser an optional peer-dependency and make user handle that on their own), b) offer some sort of utility functions that make this easier, or c) return more than just runtime variables / offer better hooks to customize behavior (e.g. return whether a cell is referencing FileAttachment/secrets, adding a "resolveNotebookPath" function in function calls, etc.).

I haven't thought too much about what this would look like, but I think this can be done last when most of the other changes are made.

Let me know what you think!

I plan to "release" dataflow in the very near future, and I want to eventually make a stable v1 release of this library sometime in the future. Not too urgent, I made a new branch of this compiler for my own needs for dataflow, but I would eventually like to use this library officially in that project.

bryangingechen commented 3 years ago

I haven't tried to use this project in anything serious yet so I can't say too much on the API design. Are there any other projects / repos that are using this right now? Maybe you could ask their maintainers what the pain points are.

I'm happy to help out with porting the code once a plan is in place.

a10k commented 3 years ago

+1 on all things you mention!

I used this for the poc canvas style editor my objective with this poc was to have a local offline version (similar to dataflow) but with fixed width/height for each cell absolutely positioned, then every few minutes it sends a json version of the code to the backend which stores them to individual files and runs git commit (so we can use git dif to track changes efficiently) and also uses a headless browser to save individual snapshots of every cell as png.. I had to do all the three changes you mention too to get this to work.

Additional notes:

  1. I had to export the parser too as you mention (needed it for prettier formatting, but the parser eats all the comments... I guess we need to make changes on the parser to change this behavior).
  2. The output of import statements, viewof and mutable statements has multiple variables, when displaying them in the ui of a cell, we want to show only one but the order of the required variable for import, viewof and mutable cells is different.
  3. Also, when I try to save a new notebook as compiled es module vs actual code as json (and including the compiler too to interpret at runtime) I did not notice any noticeable performance difference (but maybe splitting the interpreter and making it light weight and faster can be a priority on the next version? when exporting I keep the original source in a json in the .html file rather than compile it, as I dont want to lose the ability to make changes or maintain two files.

Also, just wanted to say: this library does most of the things we expect well and is flexible enough for most use cases; custom resolves, tests and your notebooks on observable helps a lot! Thanks to both of you for building this!

asg017 commented 3 years ago

Started these changes in #29!

RandomFractals commented 3 years ago

well, I am just waiting for vscode dev team to finish native notebooks api and will be updating my JS Notebook Inspector to use those interfaces with a few other things for observable notebooks and make them work more like Jupyter notebooks. Might use some of your lib parts when I get to it. Just an FYI.

You can view the work vscode dev team has done for native and Jupyter notebooks in their repo. I think they already cover many things you are trying to tackle with kernels, output renderers, etc. that are beyond what you have sketched up here:

https://github.com/microsoft/vscode/issues?q=native+notebooks