duojs / duo

A next-generation package manager for the front-end
3.42k stars 118 forks source link

Cache system overhaul #477

Closed dominicbarnes closed 9 years ago

dominicbarnes commented 9 years ago

This PR aims to begin the process of overhauling the underlying cache system by replacing it with a new module. This PR mainly drops in the new module, but there is more on the roadmap to improve duo's performance with more caching. (including extending that capability to plugins)

While working on profiling and optimizing duo, I came to discover that the current cache system is not friendly to parallel builds. This is partially due to how atomic-json works internally. It queues up multiple writes to the cache rather than truly allowing them in parallel.

When you get to have a fairly substantial cache (~5MB for us) this approach causes exponential growth in time for each file being processed in parallel. (as each new write request must wait for all the previous requests to finish first)

In our dev server, we use duo to serve up files dynamically. Most pages have around 4 assets (layout css/js + page css/js) and we would see >4s load times for that last asset. (when it could easily have finished <1s if the cache wasn't waiting to write)

I was searching for alternatives to a single JSON file, that would also allow writing truly in parallel. I settled on using LevelDB for this task, and it looks like it works perfectly for this use-case. I only needed to make minor changes here and there are no breaking changes in the Duo API.

If we decide to move forward with this as-is, I'll need something from @matthewmueller. I came across this old duo-cache module, which I didn't realize already existed. I'd like to deprecate that module (since it's not in use) and I'd like access to that package in npm so I can publish the new module as 2.0.0.

I look forward to your feedback @duojs/owners!

matthewmueller commented 9 years ago

Awesome work on this @dominicbarnes

So the reason for the existence of atomic-json is that it allowed multiple duo processes to work on the same components/ directory concurrency. I'm not all that sure that is important anymore (if ever), but if we want to keep support for it, I think we'd need something like atomic-json to exist and I doubt a leveldb cache would work since leveldb only supports a single process.

As for duo-cache, I've unpublished it :-)


Edit: I remember the usecase:

Duo(root)
  .entry('index.css')
  .run(fn)

Duo(root)
  .entry('index.js')
  .run(fn)

These are still using the same process though, so I think we could get away with it.

dominicbarnes commented 9 years ago

@matthewmueller Neither atomic-json or LevelDB support multiple processes.

What you pointed out is indeed the same process, but atomic-json suffers from the problem I outlined above, while LevelDB does not have the same issue. (which is why I've chosen it here)

matthewmueller commented 9 years ago

Ah okay sweet, I'm +1, as long as the example above still works correctly.

dominicbarnes commented 9 years ago

Yup, it still works, and it'll be faster (generally speaking) too

yields commented 9 years ago

Nice!

matthewmueller commented 9 years ago

<3 On Fri, Jul 3, 2015 at 12:17 Dominic Barnes notifications@github.com wrote:

Merged #477 https://github.com/duojs/duo/pull/477.

— Reply to this email directly or view it on GitHub https://github.com/duojs/duo/pull/477#event-347585232.