awalsh128 / cache-apt-pkgs-action

Cache APT packages in GitHub Actions
Other
205 stars 35 forks source link

Cache is installed on first run, not available when restored #110

Open myrrc opened 1 year ago

myrrc commented 1 year ago

Hi. In https://github.com/yandex/ch-tools we use your action. The issue follows:

  1. Cache is successfully installed on first run. It's used (example run https://github.com/yandex/ch-tools/actions/runs/5554305954/jobs/10143948024, "prepare build deb" and "build deb" steps).
  2. On next run, cache is loaded, but entries in the cache are not available (example run https://github.com/yandex/ch-tools/actions/runs/5558780469/jobs/10154259919, same steps).
  3. We use the following action:
      uses: awalsh128/cache-apt-pkgs-action@v1.3.0
      with:
        packages: "python3-venv debhelper devscripts"
        version: 1
        execute_install_scripts: true

    Tried using "latest" without "version" and "execute_", same situation.

  4. In logs I see
    2023-07-14T22:31:09.1226091Z 22:31:09 - debhelper=12.10ubuntu1.tar
    2023-07-14T22:31:09.5310273Z 22:31:09 Reading from main requested packages manifest...
    2023-07-14T22:31:09.5362700Z 22:31:09 - debhelper=13.6ubuntu1~bpo20.04.1
    2023-07-14T22:31:09.5831281Z 22:31:09 - debhelper=12.10ubuntu1.tar restoring...
    2023-07-14T22:31:09.6188311Z 22:31:09   done

    So target package is restored but isn't available.

Issue can be reproduced on a test repository: https://github.com/myrrc/apt-cache-restore-test (see last 2 commits).

debhelper:
  Installed: (none)

(while restoring from cache)

pboling commented 1 year ago

Similar issue on Ubuntu:

Run awalsh128/cache-apt-pkgs-action@latest
  with:
    packages: dictionaries-common wamerican
    version: 1
    execute_install_scripts: false
    debug: false
  env:
    BUNDLE_GEMFILE: /home/runner/work/json_schemer-fuzz/json_schemer-fuzz/gemfiles/vanilla.gemfile
Run ${GITHUB_ACTION_PATH}/pre_cache_action.sh \
  ${GITHUB_ACTION_PATH}/pre_cache_action.sh \
    ~/cache-apt-pkgs \
    "$VERSION" \
    "$EXEC_INSTALL_SCRIPTS" \
    "$DEBUG" \
    "$PACKAGES"
  echo "CACHE_KEY=$(cat ~/cache-apt-pkgs/cache_key.md5)" >> $GITHUB_ENV
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    BUNDLE_GEMFILE: /home/runner/work/json_schemer-fuzz/json_schemer-fuzz/gemfiles/vanilla.gemfile
    VERSION: 1
    EXEC_INSTALL_SCRIPTS: false
    DEBUG: false
    PACKAGES: dictionaries-common wamerican
16:[2](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:2)7:57 Validating action arguments (version='1', packages='dictionaries-common wamerican ')...
16:27:57 done

16:27:57 Verifying packages...
16:28:02 done

16:28:02 Creating cache key...
16:28:02 - Normalized package list is 'dictionaries-common=1.28.14 wamerican=2020.12.07-2 '.
16:28:02 - Value to hash is 'dictionaries-common=1.28.14 wamerican=2020.12.07-2  @ 1 1'.
16:28:02 - Value hashed as '63b54a3f7c283f95dd91f5be83c7dd9f'.
16:28:02 done
16:28:02 Hash value written to /home/runner/cache-apt-pkgs/cache_key.md5
Run actions/cache/restore@v3
  with:
    path: ~/cache-apt-pkgs
    key: cache-apt-pkgs_6[3](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:3)b5[4](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:4)a3f7c283f9[5](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:5)dd91f5be83c7dd9f
    enableCrossOsArchive: false
    fail-on-cache-miss: false
    lookup-only: false
  env:
    BUNDLE_GEMFILE: /home/runner/work/json_schemer-fuzz/json_schemer-fuzz/gemfiles/vanilla.gemfile
    CACHE_KEY: [6](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:6)3b54a3f[7](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:7)c283f95dd91f5be83c7dd9f
Cache Size: ~0 MB (314377 B)
/usr/bin/tar -xf /home/runner/work/_temp/395b0ded-2847-4ef1-ad06-7f912b648184/cache.tzst -P -C /home/runner/work/json_schemer-fuzz/json_schemer-fuzz --use-compress-program unzstd
Cache restored successfully
Cache restored from key: cache-apt-pkgs_63b54a3f7c283f95dd91f5be83c7dd9f
Run ${GITHUB_ACTION_PATH}/post_cache_action.sh \
  ${GITHUB_ACTION_PATH}/post_cache_action.sh \
    ~/cache-apt-pkgs \
    / \
    "$CACHE_HIT" \
    "$EXEC_INSTALL_SCRIPTS" \
    "$DEBUG" \
    "$PACKAGES"
  function create_list { local list=$(cat ~/cache-apt-pkgs/manifest_${1}.log | tr '\n' ','); echo ${list:0:-1}; };
  echo "package-version-list=$(create_list main)" >> $GITHUB_OUTPUT
  echo "all-package-version-list=$(create_list all)" >> $GITHUB_OUTPUT
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    BUNDLE_GEMFILE: /home/runner/work/json_schemer-fuzz/json_schemer-fuzz/gemfiles/vanilla.gemfile
    CACHE_KEY: 63b54a3f7c2[8](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:8)3f95dd91f5be83c7dd9f
    CACHE_HIT: true
    EXEC_INSTALL_SCRIPTS: false
    DEBUG: false
    PACKAGES: dictionaries-common wamerican
16:28:04 Found 5 files in the cache.
16:28:04 - cache_key.md5
16:28:04 - install.log
16:28:04 - manifest_all.log
16:28:04 - manifest_main.log
16:28:04 - wamerican=2020.12.07-2.tar

16:28:04 Reading from main requested packages manifest...
16:28:04 - dictionaries-common=1.28.14
16:28:04 - wamerican=2020.12.07-2
16:28:04 done

16:28:04 Restoring 1 packages from cache...
16:28:04 - wamerican=2020.12.07-2.tar restoring...
16:28:04   done
16:28:04 done

Run rm -rf ~/cache-apt-pkgs
  rm -rf ~/cache-apt-pkgs
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    BUNDLE_GEMFILE: /home/runner/work/json_schemer-fuzz/json_schemer-fuzz/gemfiles/vanilla.gemfile
    CACHE_KEY: 63b54a3f7c283f[9](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:10)5dd91f5be[83](https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293#step:3:87)c7dd9f

It claims the cache size is zero kb, which is silly, since the american dictionary size is not 0 kb.

Ref: https://github.com/pboling/json_schemer-fuzz/actions/runs/6657019792/job/18090858293

awalsh128 commented 1 year ago

Enable debug. It will create upload logs (action/upload-artifact) so I can look at the file dumps (which will be available in the action results dropdown as artifacts). Once finished, link to those runs and I can take a deeper look.

sebasfalcone commented 10 months ago

Hi! We are having a similar issue

The cache seems to have broken from run to run.

For some context, we are installing valgrind and one of its dependencies (libc6-dbg) on the same step:

This is the run:

On the run that uses the cache we see this:

ls: cannot access '/home/runner/cache-apt-pkgs/*.tar': No such file or directory
15:29:30.253 Restoring 0 packages from cache...
15:29:30.255 done

cat: /home/runner/cache-apt-pkgs/manifest_all.log: No such file or directory

Hint?

On another note, after removing the installation of libc6-dbg, the cache works as expected

The reasoning behind installing the dependency was that this action was not being able to install valgrind, saying that the package was not available

bscott-zebra commented 3 months ago

While using this action to try and cache an install of lintian, I ran into a similar to problem to what is described here. I spent a little time digging into it, and I believe the root cause of the problem is the use of xargs when using tar to archive the package files, here: https://github.com/awalsh128/cache-apt-pkgs-action/blob/f2fc6d1af4d6abf8a4dcd37fd74a9a15c2273b9f/install_and_cache_pkgs.sh#L97

xargs has a default --max-chars limit (per command it will run) of 128KiB (assuming the system ARG_MAX is significantly larger than this), so for packages that contain a large number of files, this limit is exceeded, and the argument list is split into multiple tar runs. Since the -c option is used to create a new tar archive each time, only the files passed into the last execution will wind up in the resulting tar file.

Possible solutions are:

  1. Pass a large --max-chars value to xargs ; under Ubuntu 24.04 I see the real limit is nearly 2MiB (less environment size + 2048) ; a larger value may be specified and xargs will limit to the real maximum, but a warning message will be output and the real limit will be used ; there will still be an upper limit on the files that can be successfully archived, but it would be much higher than currently used
  2. When running tar use -r instead of -c such that subsequent tar runs will append files to the archive, rather than starting over ; this is probably the minimal change to get things working, and there should be no limit on how many files could be included in the archive
  3. Instead of passing the files to archive via the tar command line, when running tar you could use --verbatim-files-from in combination with --files-from <FILE-LIST> to pass the list of files to archive via a file ; awk would no longer be needed to add single quotes to each file, and there should be no limit on how many files could be included in the archive ; process substitution (i.e. <(COMMAND...) could be used to handle the FILE-LIST and avoid creating an actual file

I'd go with option 3 myself, but option 2 should also work well enough. Option 1 would allow for correct operation with many large packages, but would still leave an upper limit that would cause problems with very large packages, so doesn't seem like the way to go.

Complete change to the "Pipe all package files" bash script block for option 3:

    # Pipe all package files (no folders) and installation control data to Tar.
    tar -cf "${cache_filepath}" -C / --verbatim-files-from --files-from <( { dpkg -L "${package_name}" &&
      get_install_script_filepath "" "${package_name}" "preinst" &&
      get_install_script_filepath "" "${package_name}" "postinst" ;} |
      while IFS= read -r f; do test -f "${f}" -o -L "${f}" && get_tar_relpath "${f}"; done )