Closed BryanCrotaz closed 8 years ago
This could be done by storing a hash of the brocfile in the tmp folder and checking it on startup.
this hash is much more complicated then just of the brocfile.
directory is created on each run of broccoli
each run, or each rebuild? We expect persistentOutput to only live for the duration of a process's life not beyond.
so how does caching work if you run the process twice?
for example when you want to do regular testing then do a deploy - our build takes around 6 minutes
so how does caching work if you run the process twice?
The caching you see is for rebuilds (in the same process) not builds between different processes. The later is quite tricky to get correct. Remember, cache invalidation is one of those gnarly problems.
For some plugins, like broccoli-peristent-filter the inputs/output/options/dependencies is mostly known, so a more aggressive inter process persistent cache can be created. It is though, not without its own caveats (several pending issues must yet be resolved, and great care was originally taken).
In many cases, you should be able to do something like:
ember build --output-path somewhere --env production // build it once
ember test --path somewhere // if i recall, this or something like it allows an existing build to be used
// and if ember deploy is a well behaved citizen maybe it supports
ember deploy --path somewhere
trouble is, I can create a persistent cache for archiver, but unless the incoming merge, funnel, replace etc also use one, the input files are going to change every time. And hashing every file is likely to be just as slow...
And hashing every file is likely to be just as slow...
you should check, hashing is really fast (largely just limited by available IO).
What you describe is a trade-off between stale reads and accurate reads. We error on the side of accurate reads, because the alternative leads to very bad things. Continuing to improve and evolve the caching + perf story is great but we cannot sacrifice build accuracy or we will just cause much more grief.
An alternative to explore is what i described above (build once, and reuse for multiple commands).
no, obviously build accuracy has to come first, and cache invalidation should err on the side of pessimism. I'll investigate hashing and persistent caching.
@BryanCrotaz being able to build once, and then reuse for testing + deploying also seems like a big win (if it is not already possible).
But even then, you'd come back the next day and be hit by a 6 minute startup
@BryanCrotaz I also suspect, some of the slower steps. Like uglification may benefit from utilizing persistent-filter, as I suspect (for example) that your apps vendor file does not change very often..
and you'd need to leave your build process running so as to get in-process caching, so you're in danger of kicking off another build while deploy is still in progress
and you'd need to leave your build process running so as to get in-process caching, so you're in danger of kicking off another build while deploy is still in progress
builds don't mix, whats the danger?
But even then, you'd come back the next day and be hit by a 6 minute startup
development warm builds should not take 6 minutes.
Our largest app is about 500kloc of app code (not addons not vendor), and the warm dev (non-prod) boots are 12s.
to get caching, you're (I think) suggesting running two instance of broccoli with different output directories
@BryanCrotaz no
development warm builds should not take 6 minutes. I bet you're not zipping multiple copies of your resulting node_modules and injecting binaries like ffmpeg into them?
@BryanCrotaz no
Then you're surely going to get nothing better than what we have at the moment - build taking 6 minutes before a test or deploy
@BryanCrotaz it sounds like your custom steps require optimization. Most likely following the approach taken by persistent filter will lead you down a path of success.
@BryanCrotaz no Then you're surely going to get nothing better than what we have at the moment - build taking 6 minutes before a test or deploy
@BryanCrotaz no, I believe you are confused. The idea is, build once, then subsequent commands should use the already built output (not build there own) to perform tests + deploy etc.
The idea is, build once, then subsequent commands should use the already built output to perform tests + deploy etc.
That's what we're doing - but if tests show a bug (though of course we never have bugs!) then you have a 6 minute cycle to try again.
it sounds like your custom steps require optimization
We build about 15 zip files - in 99% of cases only one of them includes new source changes, so if we could cache we'd only need to rebuild one of them
Our custom steps are just concatenating json, eg building AWS CloudFormation templates from lots of small files. Again, these change rarely, so caching would fix this problem.
@BryanCrotaz i was referring to:
I bet you're not zipping multiple copies of your resulting node_modules and injecting binaries like ffmpeg into them?
That's what we're doing - but if tests show a bug (though of course we never have bugs!) then you have a 6 minute cycle to try again.
It seems like:
/tests
while you develop (for fast iteration), or run ember test --server
during developmentwould catch your test failures nice and quick.
Starting and stopping the process to run tests seems very strange.
Can't avoid that bit - we're building for AWS Lambda, so it requires a zip of the executable node files. I've built an npm -pull plugin which only brings in the modules we're actually using, but it's still a lot of files.
That would be great if it was an ember app - this is a broccoli problem, not an ember problem!
this is a broccoli problem,
Im dubious of this.
Can't avoid that bit - we're building for AWS Lambda, so it requires a zip of the executable node files. I've built an npm -pull plugin which only brings in the modules we're actually using, but it's still a lot of files.
have you tried zipmerge ? Seems like you could zip your node_modules, then append the zip of your app code.
Seems like you could zip your node_modules, then append the zip of your app code
Surely this has the same caching problem?
it would be a great way to speed it up once persistent caching is working
@BryanCrotaz ya, but your working on your app you can manually be aware if you have made a change to your node_modules or not. Broccoli cannot infer this, without extra work.
Can't broccoli treat node_modules as any other source folder?
Can't broccoli treat node_modules as any other source folder?
It does
then why can't broccoli infer this?
then why can't broccoli infer this?
It can, it just must diff the whole fs tree. Which is why i said, "Extra work".
Well, to be accurate It also would need to be aware of the node resolution algorithm and walk in both directions (up and down) looking for any ancestor node_modules and also traversing them.
so basically to get persistent caching we need a fast way to diff a requires
tree. Early exit on the first diff found would be fine (pessimistic).
and using file hashing rather than last modified so that other plugins that copy files wouldn't break downstream caches
being clever with symlink recognition would hopefully shortcut some steps
@BryanCrotaz well its more then just then, all inputs must be part of the cache. This is quite tricky, and also quite costly to perform.
I really think you should focus on maintaining the zip of your deps manually (just add a postinstall hook or something to invalidate it), and using zipmerge as part of the build.
other plugins that copy files
do plugins still do this? Our builds are without such steps.
can you add a global postinstall hook? you'd need it to fire on every install
do plugins still do this? Our builds are without such steps.
Yes - every time you rerun broccoli
can you add a global postinstall hook? you'd need it to fire on every install
Sounds like something you should investigate. I am merely hinting at a simple solution.
and using file hashing rather than last modified so that other plugins that copy files wouldn't break downstream caches
this is how broccoli-persistent-filter works.
so if it works for broccoli-persistent-filter, why can't it work for broccoli-caching-writer?
so if it works for broccoli-persistent-filter, why can't it work for broccoli-caching-writer?
it could, but not in the general case as input/output
is typically entirely opaque for caching-writer. Where as broccoli-filter the input is known, as the filter is responsible for input reading, and passing it to the processString.
I really think you should focus on maintaining the zip of your deps manually (just add a postinstall hook or something to invalidate it), and using zipmerge as part of the build.
npm does have global scripts, however the dependencies are generated by walking the requires
tree, so the deps are dependent on the source. Darn.
npm does have global scripts, however the dependencies are generated by walking the requires tree, so the deps are dependent on the source. Darn.
Write
you should read the source for persistent-filter, as we have sorted out most of that issue.
is there scope here for a clone of persistent-filter which takes in a tree and outputs a new tree rather than a 1:1 map? The whole output is recalculated if any of the inputs have changed?
See [https://github.com/ember-cli/broccoli-caching-writer/issues/70] for background.
I think the bug here might be a bug/design-flaw in broccoli. It appears (from my very much outsider view as a plugin writer) that a new
tmp/cache
andtmp/output
directory is created on each run of broccoli. This isn't inline with the stated aim of thepersistentOutput
option - what's the point of keeping outputs around for a subsequent run and then creating a new folder and not reusing the previous output?Would it be better if the
tmp\xxx
directory name was predictable, e.g. using classname+instance number orannotation
and was only cleared down if the BrocFile was changed? This could be done by storing a hash of the brocfile in the tmp folder and checking it on startup.Plugins could now reuse their previous output as per the broccoli design goal.
Thoughts? @joliss @rwjblue @stefanpenner