ddev / github-action-setup-ddev

Set up your GitHub Actions workflow with DDEV
GNU General Public License v3.0
33 stars 7 forks source link

Recommendations using action/cache to speed up action with package caches? #15

Open mglaman opened 1 year ago

mglaman commented 1 year ago

I have no idea if this is possible, but:

The setup action takes ~1m 37s or so. I was wondering if it could be sped up by leveraging actions/cache. I think that could help speed up

sudo apt-get update && sudo apt-get install -y ddev && mkcert -install

But maybe not.

davereid commented 1 year ago

I also made an attempt at using action/cache for the internal DDEV composer cache as well and didn't get very far, but I also didn't try very hard. I'd love to work on a guide for how to cache better and speed things up.

mandrasch commented 1 year ago

+1 would be interesting, also wondered about this (but unfortunately no previous experience with GH actions/cache)

mandrasch commented 1 year ago

As far as I understand this is not fully cached solution, but the database is snapshotted which could speed up the process a bit 🤔 Blog post by @mglaman:

https://mglaman.dev/blog/using-ddev-snapshots-speed-github-actions-workflows

NamelessCoder commented 10 months ago

TL;DR: Caching the docker assets involved with a DDEV project either requires a caching logic only achievable by a rather complex script - or makes the CI run take 10x longer than without the cache.

I've been experimenting with the GH "cache" action, trying to persist the locally stored images between CI runs with the goal of optimizing the step that takes longest: the pulling and initialization of docker images when DDEV is initialized. Unfortunately, my findings aren't exactly encouraging (read: downright depressing).

First of all: due to the way Docker stores images it isn't possible to simply cache/restore a simple directory with all of the images inside. Docker fragments things into hundreds of pieces with each installed image taking up dozens of individual files, each one adding a diff to the base image. While images can actually be exported to a single archive, it does not appear to be possible to import such an archive to the images collection. Fragments are stored in one location, mixed in with fragments of containers and volumes, and the only way to know which exact fragments belong to a given image is to:

Additionally, an image is inoperable without the metadata file (which is what is actually saved in the directory that appears to contain images in the Docker root directory. This means that in order to extract and cache a single image, one would need to run and parse the output of several docker commands, construct a manifest of the files, produce an archive of said files and cache this archive (plus do the reverse in the restore step).

Alternatively, the entire docker root directory would need to be cached. In my experiments, doing this takes at least 15 (!!) times as long as installing the images on every run - which obviously isn't feasible.

So we are left with a situation where you would either need to write a complex script to handle both the caching and restoring of docker images by running several commands and parsing their output (not to speak of actually identifying which images to handle this way) - or use a bulk strategy that is so time consuming that it completely defeats the purpose of caching in the first place. In my setup (a TYPO3 instance with a Solr container attached) this produces a 2.8 GB tar archive (note that GitHub actions' allowed cache is a total of 10 GB and one cache archive is likely required for each environment executed by the CI). In fact, even the cache-restore step takes longer to execute than installing the images on every run.

The situation is further complicated by the fact that docker runs as super-user and the CI steps do not.

Some numbers:

My conclusion is that even if the cache-save step could somehow be optimized or made selective so it doesn't need to execute every time, the cache takes so long to restore that it simply makes things worse than had there been no cache.

In case you are feeling masochistic enough to want to try this yourself - or believe that you can improve the strategy - here is what your GH action manifest would need to contain:

    steps:

[...]

      - name: Make docker data directory writable
        run: sudo chmod -R 777 /var/lib/docker

      - name: Restore DDEV images from cache
        uses: actions/cache/restore@v3
        with:
          path: /var/lib/docker
          key: ddev-images

      - name: Setup DDEV
        uses: ddev/github-action-setup-ddev@v1

[...]

      - name: Stop DDEV
        run: ddev poweroff

      - name: Make docker data directory readable
        run: sudo chmod -R 777 /var/lib/docker

      - name: Update cache containing DDEV images
        uses: actions/cache/save@v3
        if: always()
        with:
          path: /var/lib/docker
          key: ddev-images

Notes:

If you do decide to attempt this: then good luck! And if you come up with a better solution or decide to write the necessary complex script that would enable a partial caching of docker assets to handle images alone, please do update this issue with your solution.

mandrasch commented 10 months ago

@NamelessCoder impressive research, wow! Thanks for investing the time! 😮 👏

rfay commented 10 months ago

My overall experience with docker images is that extracting them can take longer than downloading them on a well-connected server like github. So that's similar to the experience here. If you have to jump through hooks to extract the image, you end up with not so much value. And then if you have to save them away also...