Closed Joshua-Anderson closed 8 years ago
i had thought about this approach before but here we would need to upload the cache after a build and download the cache before a build which would mean extra work than if we are directly downloading dependencies or building without using the cache.
There are some situation when using the cache would be slower, however, in most situations, it is significantly faster.
yes there are definitely few situations where it is better than not having cache at all.It's just we need to think about all the pros and cons before starting to work on this.
I can do some performance testing if you would like. Anecdotally from my travis-ci experience I've seen noticeable improvements with nodejs, go, python, and ruby when caching on travis-ci.
In rails apps a common step is compiling assets. In my case this takes 700-800s on deis V1 after a reschedule of builder. (lots of javascript libraries; I probably should take some time to optimize this but thats another story) Downloading from S3 would be significantly faster as generally a second build only takes around 10s if nothing has changed (so even if downloading the cache takes a minute I gain a lot by this)
Having a per app setting (or action) to (temporirly) disable the cache would probably be a good idea. Apparantly Heroku has something like that as a plugin only... (https://github.com/heroku/heroku-repo)
@kmala @Joshua-Anderson in Rails, this makes a very large difference, although I can't speak for many others, downloading dependencies happens almost everywhere, and that would benefit greatly from caching. Not just assets as @nathansamson mentioned, but also just downloading gems. Deploying our current project to Heroku takes about 15-20 seconds, while the same deploy can take up to 15 minutes on Deis.
As for the implementation, storing them could be in GCS, but even when storing them inside the cluster (with the risk of losing when the pod restarts) would be better than what we have right now. Flynn does this too, and they just upload/download the files to/from that blobstore. We could do something similar (whether using S3/GCS or Minio), and gain a lot out of this. Most buildpacks (ruby, php, nodejs, python, go (multiple places)) explicitly use this cache to store artifacts that can be reused, but since we don't persist this between deploys, we lose the performance that it was designed to gain.
There's also other caches (although currently disabled) that we could implement later, but those would only benefit new apps.
Heroku offers a $CACHE_DIR for buildpacks to store files that will be persisted across builds (reference).
Builder offers the directory for buildpack compatibility, but the contents of the files are not persisted when the slug builder exits.
Deis V1 used the same builder for all apps and stored the cache directory in a volume, so unless it got rescheduled it cached properly.
For deis workflow however, I recommend uploading the cache dir after a build to the builder object store bucket and downloading it at the start of the next build.
Sometimes, the cache becomes corrupted and needs to be deleted. Sometimes users do not want to cache the cache directory. In that case, they could set the
DEIS_BUILDPACK_CACHE
environmental variable tofalse
, which would trigger the slugbuilder to delete the cache if it does exist and skip uploading the cache at the end of the build.