airbnb / hypernova

A service for server-side rendering your JavaScript views
MIT License
5.82k stars 210 forks source link

`createGetComponent` doesn't prevent leaking data between requests #22

Open alexindigo opened 8 years ago

alexindigo commented 8 years ago

As per documentation: The most common use-case would be to use a VM to keep each module sandboxed between requests. You can use createGetComponent from Hypernova to retrieve a getComponent function that does this.

I'm using createGetComponent to load following component:

import React from 'react';
import { renderReact } from 'hypernova-react';

let meme = 25;

function SimpleComponent({ name }) {
  return <div onClick={() => alert('Click handlers work.')}>Hello, {meme++}, {name}!</div>;
}

export default renderReact('SimpleComponent', SimpleComponent);

with following getComponent function

  getComponent: createGetComponent({
    SimpleComponent: path.resolve('build/components/SimpleComponent/index.js'),
    Counter: path.resolve('build/components/Counter/index.js')
  }),

And meme counter keeps ticking:

incrementing value

alexindigo commented 8 years ago

@goatslacker It seems more than documentation bug, running things in VM implies isolation. Seems like having cache in front of the VM creates the issue.

goatslacker commented 8 years ago

I marked it as a bug + documentation because it can be both. Either we fix the issue or we amend the documentation to make it explicit that using createVM and createGetComponent will leak.

It's a tough situation because if you don't have a caching layer then you're adding server-render time to your total time. It's not cheap to eval code.

There are a few things you can do to make things faster AND get the benefits of isolation. You'd have to basically pack all the code into a single bundle (webpack/browserify works) and then you can re-eval that inside a VM. I don't have numbers with me but that shouldn't add too much latency.

alexindigo commented 8 years ago

Yep, I'm planning to try different options too. Not sure how bundling "all the code" would help though. My immediate candidates to try are: regular vm thing, code-revaluation and hacky thing of new Function combined with with :))) but need to setup sound load testing environment first and get the base line.

goatslacker commented 8 years ago

Bundling all the code helps because a lot of the time spent creating a new VM is in resolving the require paths, crawling through the file system, reading all the code and evaluating it (which creates a separate new module). When you bundle all the code there are less files that need to be resolved and thus less code to iterate through. It's just a single bundle VM.

Don't take my word for it though, try it out yourself :)

alexindigo commented 8 years ago

I was going to preload all the files (either parsed or not) at start time in any case, I/O during request is not my thing :)

goatslacker commented 8 years ago

I've tried that before as well, caching all the files and then just re-evaluating the VM on each request. I forget what my findings were. It might be wise to revisit that idea.

Right now having things shared between each VM isn't that big of a deal given the tradeoffs. Rendering is still a synchronous affair so any potential data that could be leaked is wiped at the start of each new request. This will be something to look after though as we (hopefully soon) move towards either streaming or async rendering.

alexindigo commented 8 years ago

I want to know more about async rendering :)

morganpackard commented 7 years ago

I'm going to take a crack at this. I think we can cache the function returned by vm.runInNewContext and just repeatedly call it to initialize a clean module once per request.

I did some profiling on module initialization, measure the time to load, parse, and run react.js.

// parse the wrapped module code
const executedWrapped = runInNewContext(wrapped);

// initialize module
const initializedReact = executedWrapped(localExports, require, localModule, __localFilename, __localDirname);

I recorded the time to parse, and the time for the first module initialization, then the average time for 10k subsequent initializations. Once the module has been initialized once, subsequent initializations are much cheaper, but not free.

time to interpret the initial string (but not initialize the module): 35ms time for initial module initialization: 59ms average time for subsequent module initialization: 2.9ms

In this test, subsequent initializations (after the expensive initial one) take about 5% of the time as the the initial round.

morganpackard commented 7 years ago

I have a solution that's working. Here are some key points on what my fork does:

I'd love to move toward a PR for this code if you'd be up for taking a look. It still needs a little polishing, but I'd be happy to submit a work in progress PR if you're interested in seeing it sooner rather than later.