Bundle cache - Githubissues

sfcgeorge commented 5 years ago

Actions CI with Ruby is far slower than other CI products because it downloads and installs gems every time. For a tiny Rails site this adds about 3 minutes to the runtime, big sites I imagine would be far worse.

Is there a way to do what other CI tools do and cache the bundle folder so that on subsequent runs only updated / new gems are installed?

For example CodeShip says they "automatically configures bundler to use the $HOME/cache/bundler directory, which we save between builds to optimize build performance". I assume they set a default bundler config:

https://bundler.io/v1.1/man/bundle-config.1.html

And then any user supplied bundle install command will use that default config including the cache.

https://bundler.io/bundle_install.html

It looks like --path might be what does it.

damccorm commented 5 years ago

Right now we don't have this type of caching, but it's definitely on the feature backlog short list. We're tracking this in https://github.com/actions/toolkit/issues/47, does the feature discussed there look like it meets your needs?

If so, can we close here and track in that issue?

sfcgeorge commented 5 years ago

Thanks for that, base level caching does seem to be a prerequisite. That issue seems to be asking more for user defined "workspaces" but we'll see how it evolves.

I think this issue should stay open as some work would need to be done here to build on top of the low level caching. You could argue the user should set up a bundle cache themselves, but you could argue they should set up Ruby themselves too. It would be lovely if when you bundle install via the setup-ruby action it's automatically configured to be cached and fast. People coming from other CIs will expect it, and those who don't might not think to do it themselves.

Perhaps add a "blocked" tag or similar until toolkit has caching? Some work could be done here in the mean time to automatically add a bundler config file pre-populated with best-practice defaults such as --jobs=4 (or however many cores these machines have for faster parallel bundling).

damccorm commented 5 years ago

Thanks for the feedback ❤️ its helpful and maybe can help shape some of how we approach caching.

I think this issue should stay open as some work would need to be done here to build on top of the low level caching. You could argue the user should set up a bundle cache themselves, but you could argue they should set up Ruby themselves too.

I'm not 100% convinced that it makes sense for this action to do that job, but also not 100% convinced that it shouldn't be this action. Lets definitely keep open and readdress once caching is implemented.

Some work could be done here in the mean time to automatically add a bundler config file pre-populated with best-practice defaults such as --jobs=4

I'm interested, but not convinced we should do this. My instinct is to just follow the Ruby defaults (which in theory should be setting best practices) - curious as to why you think we should create our own here

sfcgeorge commented 4 years ago

We found where Bundler caches things and how to make the new actions/cache work with that. The default bundle cache lives in ~/.bundle so just caching that folder should allow bundler to detect and reuse it. Or you can move it elsewhere with bundler's --path flag.

Below is our setup for Ruby, we needed to specify a Ruby version so couldn't use setup-ruby but the cache bit should be translatable.

linters:
    name: Linters
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v1
      - uses: actions/cache@preview
        id: ruby-cache
        with:
          path: ~/local/rubies
          key: ruby-2.6.5
      - uses: clupprich/ruby-build-action@master
        name: Install of Ruby 2.6.5
        id: ruby
        with:
          ruby-version: 2.6.5
          cache-available: ${{ steps.ruby-cache.outputs.cache-hit == 'true' }}
      - name: Remove ruby-build script
        run: rm -rf ruby-build/script
      - name: Print Ruby version
        run: ${{ steps.ruby.outputs.ruby-path }} --version
      - uses: actions/cache@preview
        id: bundle-cache
        with:
          path: ~/.bundle
          key: ruby-2.6.5
      - name: Build
        run: |
          gem install bundler
          bundle install --jobs 4 --retry 3

RE sensible defaults for Bundler: Bundler isn't automatically optimised for CI or production setups, I imagine partly because it wouldn't work well running on a single threaded Raspberry Pi type setup and they don't want to make any assumptions for you, and partly because it's really old and they don't want to introduce even potentially breaking changes. But on CI which you want to be fast and reliable, I don't see any reason not to have it install in multiple threads, and retry a couple times on network glitches. Nothing is more annoying than a slow or unreliable CI.

So my preference would still be to have smart defaults in this action that optimise for speed and reliability. That'll make the out of box experience much better for the vast majority of people. Then anyone with large legacy setups can override the defaults; because they'll surely know their setup is weird and have the skills to configure for it already.

I guess it depends what GitHub's goals for default Actions are. If it's just to have a very basic bare minimum setup that can be built upon then sure don't do any of this. But I'd question the value of that since you can run custom commands anyway if you want a minimal setup. If the goal is sensible defaults to get people up and running quickly without having to customise a bunch of YAML then absolutely you should set these things.

Also note other CIs like CodeShip automatically cache the .bundle folder and install in parallel with retries. And so does Heroku even for deployment. If anything goes funny you just reset the cache. Since GitHub is one of the biggest Rails companies I certainly expected this action to be the most polished.

joshmgross commented 4 years ago

@sfcgeorge We have examples of using Ruby in the examples page of the actions/cache here

If you think this example is insufficient, or want to add another example, feel free to open a PR

I guess it depends what GitHub's goals for default Actions are

Right now, the cache action is meant to allow enough explicit configuration to enable caching for any workflow, without making assumptions of the repo's workflows or contents. The downside of this is that more configuration is required for most workflows compared to other CI providers that just enable caching by default. You can see more discussion on this issue https://github.com/actions/cache/issues/94.

Are you having issues using the cache action specifically with the workflow you provided? I'd also recommend updating to v1 of the cache action for better performance, we've made a lot of improvements since preview

sfcgeorge commented 4 years ago

I've added further comments https://github.com/actions/cache/issues/94#issuecomment-571690134

I do have caching working, my issue is that it's not on by default when Bundler locally caches by default. Why should the default be different, and worse, in my fancy cloud CI?

bryanmacfarlane commented 4 years ago

Closing in favor of the cache action. I don't think there's any action for the setup-ruby action here. Correct me if I'm wrong.

@joshmgross - perhaps a link in this readme over to the cache example so it's discoverable?

actions / setup-ruby

Bundle cache #11