awalsh128 / cache-apt-pkgs-action

Cache APT packages in GitHub Actions
Other
205 stars 35 forks source link

APT package restore for QEMU packages isn't working #44

Closed crdoconnor closed 2 years ago

crdoconnor commented 2 years ago

Example here:

Workflow: https://github.com/crdoconnor/apt-cache-bug/blob/main/.github/workflows/example.yml

Actions: https://github.com/crdoconnor/apt-cache-bug/actions

There are 3 runs of the github actions. First is an expected failure. Second is an expected pass (apt-get install fixes the issue).

Third is an unexpected failure - the cache restore should restore everything as before, but it doesn't quite seem to. It might be missing some files, I guess?

Anyway, very grateful for any help you could provide.

awalsh128 commented 2 years ago

Hi @crdoconnor, could you provide the run logs for the action?

crdoconnor commented 2 years ago

logs_1.zip logs_2.zip logs_3.zip

If you fork the repo and tweak the README 2 times you should also be able to see the same effect.

awalsh128 commented 2 years ago

Sorry I didn't realize you set up that repo just to demonstrate the issue. Very helpful.

It looks like the OS is complaining that the uname binary is for a different arch. I am guessing this may be because I am not capturing the arch with the cached package information. It is probably caching a different arch version which causes the exec format error when the container calls uname.

I am not as familiar with podman and QEMU so would appreciate your input here. My initial guess is that it is caching a non-ARM version of one of the QEMU components. I will debug against the repo you provided in the meantime.

awalsh128 commented 2 years ago

I've added the regression test to the CI (https://github.com/awalsh128/cache-apt-pkgs-action-ci/commit/b969f31366cfc3ef290bbe494ebc989ac7dd9ed9) just for dev branch as of now and can repro it.

https://github.com/awalsh128/cache-apt-pkgs-action-ci/runs/7707354683?check_suite_focus=true

You can delete your repo if you want.

awalsh128 commented 2 years ago

Below are the affected packages

awalsh128 commented 2 years ago

Iterative installation after install shows that qemu-user-static fixes issue as shown in the log 6_Run debug script after action execution..txt. Need to examine the file structure of the post install and post cache cases.

awalsh128 commented 2 years ago

This package allows for multi-arch binary support in an emulator container. The package files themselves seems to match the restored files. I am guessing the other unknown would be any make artifacts aside from the files themselves.

Not a Debian package dev expert by any means but may see if I can understand a bit better. In the meantime I would suggest installing qemu-user-static independently.

crdoconnor commented 2 years ago

Yeah, that makes perfect sense. This is probably an un-run script.

Is it possible to cache the post install scripts and run them when restoring the cache?

On Sat, 3 Sep 2022, 06:11 Andrew Walsh, @.***> wrote:

This package allows for multi-arch binary support in an emulator container. The package files https://github.com/awalsh128/cache-apt-pkgs-action/files/9481729/dpkg-files.txt themselves seems to match the restored files https://github.com/awalsh128/cache-apt-pkgs-action/files/9481727/6_Run.debug.script.after.action.execution.txt. I am guessing the other unknown would be any make artifacts aside from the files themselves.

Not a Debian package dev expert by any means but may see if I can understand a bit better. In the meantime I would suggest installing qemu-user-static independently.

— Reply to this email directly, view it on GitHub https://github.com/awalsh128/cache-apt-pkgs-action/issues/44#issuecomment-1236050635, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOJKNMXXNWMI3CDKMRCLOTV4LMYVANCNFSM55XMXFCQ . You are receiving this because you authored the thread.Message ID: @.***>

awalsh128 commented 2 years ago

The Debian Policy Manual goes over how the control data is executed. postinst is probably what we want to look at. To extract the control data from a downloaded package.

dpkg -e /var/cache/apt/archives/<pkg>*.deb /tmp/<action>/<pkg>/control
cat /tmp/<action>/<pkg>/control/postinst

The one for this particular package is. Where the arg in this case is configure as noted in the policy manual.

#!/bin/sh
set -e
binfmt_update() {
  test configure = "$1" || return 0
  command -v update-binfmts >/dev/null || return 0
  # do not touch binfmts inside a container
  # should this be done by update-binfmts instead?
  if command -v systemd-detect-virt >/dev/null; then
     systemd-detect-virt --quiet --container && return 0
  fi
  grep -zqs ^container= /proc/1/environ && return 0
  for fmt in  alpha armeb cris hexagon hppa i386 m68k microblaze mips mipsel mipsn32 mipsn32el mips64 mips64el ppc ppc64 ppc64le riscv32 riscv64 s390x sh4 sh4eb sparc sparc32plus sparc64 x86_64 xtensa xtensaeb; do
    update-binfmts --import qemu-$fmt
  done
}
binfmt_update "$1"

I tried running this in a post restore test (log) but it encountered some errors. I am assuming there is something else I am missing because the post install didn't have errors on the initial install. It is interesting that it involves update-binfmts because this has interaction with the Linux kernel, and could hint at why a file restore isn't working.

I'm a bit uncomfortable with the idea of caching and running the post-install scripts because this would cut into the runtime performance of the action and its intended goal. I do want to solve this issue though and how we can most gracefully address it. I still need to replicate a fix though.

If you want to help, you can clone this entire repo, modify: https://github.com/awalsh128/cache-apt-pkgs-action-ci/blob/master/devtools/debug/restore_post_action.sh ... and then run the "Debug action" workflow.

When I have some more time I will keep digging.

awalsh128 commented 2 years ago

Actually I realized the script wouldn't be the same on my box because it is a different version.

Did this instead and it worked (run log).

#!/bin/bash -x
set -e
apt-get install --download-only qemu-user-static
mkdir /tmp/qus-ctrl
dpkg -e /var/cache/apt/archives/qemu-user-static_1%3a4.2-3ubuntu6.23_amd64.deb /tmp/qus-ctrl
sh -x /tmp/qus-ctrl/postinst configure
podman run --rm -it docker.io/arm64v8/alpine:3.14 uname -m

I need to give this issue some thought. Packages like these have some interaction with the kernel, which makes them a special case. I'd almost like to just put them on an explicit special handling list but we need something that scales well. Will think about this some more.

crdoconnor commented 2 years ago

My understanding from this is that postinst is just... skipped when restoring the packages? Is that correct?

If that's true then that would break a whole range of debian packages not just this one. It's pretty common to have a postinst, I think. It doesn't seem ungraceful to just try and run it every time for every package if there is one even if it does slow it down. It would probably enable a whole lot of packages to be useable with this action.

It raises the even thornier question of what to do about preinst.

Thank you for digging into this issue and for creating this action and debugging all of its thorny, dark corners :)

crdoconnor commented 2 years ago

If you want to help, you can clone this entire repo, modify

I'm assuming since you've managed to replicate a fix you no longer want this?

Let me know if I misinterpreted though, or there's another way I can help.

awalsh128 commented 2 years ago

That's right, you don't have to now that it has been replicated.

This action is a time saver but it is based on the assumption that restoring files is good enough, which as we know isn't totally true. A lot of the control data files work around this idea (chmod, other file generation, etc). So the foundation of the action is not entirely sound but is established assuming that any issues will be outliers. Some things to note:

Sorry, I didn't mean to imply running the postinst would be ungraceful. Only as a one off for this situation. I'd like to avoid preinst for now until it gets flagged as an issue, especially given the small usage and technical problems posed. As far as postinst, I'll run some benchmarks to see how it performs. Like all of my tests in this comment, they aren't exactly reliable given how fast and loose I have been playing with them. I'll need to hunt out some meatier packages with control data. In the meantime I'll see if I can introduce this as an experimental feature controlled with an input flag.

Happy to do the digging. Always like a good challenge and glad you find the action useful.

awalsh128 commented 2 years ago

Hacking on a draft in dev and may have something ready for preview. I am not sure how broken it is but you can always switch your action to @dev and specify the execute_postinst input as true to test drive (which may run off a cliff and explode). Jokes aside, nothing should cause an issue with the cache since this is all restore behavior being experimented with.

awalsh128 commented 2 years ago

After working with the experimental version, I think I have hit a hard spots. In my case I am testing the restore of xdot. It contains the following postinst script.

#!/bin/sh
set -e

# Automatically added by dh_python3:
if which py3compile >/dev/null 2>&1; then
        py3compile -p xdot 
fi
if which pypy3compile >/dev/null 2>&1; then
        pypy3compile -p xdot  || true
fi

# End automatically added section

Unfortunately py3compile performs a dpkg lookup, giving me the error (verbose enabled).

+ echo 21:42:20 '- Executing /tmp/cache-apt-pkgs-action/restore/var/lib/dpkg/info/xdot.postinst...'
21:42:20 - Executing /tmp/cache-apt-pkgs-action/restore/var/lib/dpkg/info/xdot.postinst...
+ sh -x /tmp/cache-apt-pkgs-action/restore/var/lib/dpkg/info/xdot.postinst
+ set -e
+ which py3compile
+ py3compile -p xdot
dpkg-query: package 'xdot' is not installed
Use dpkg --contents (= dpkg-deb --contents) to list archive files contents.
Traceback (most recent call last):
  File "/usr/bin/py3compile", line 290, in <module>
    main()
  File "/usr/bin/py3compile", line 269, in main
    compile(files, versions,
  File "/usr/bin/py3compile", line 154, in compile
    for fn, versions_to_compile in filter_files(files, e_patterns, versions):
  File "/usr/bin/py3compile", line 106, in filter_files
    for fn in files:
  File "/usr/share/python3/debpython/files.py", line 71, in filter_public
    for fn in files:
  File "/usr/share/python3/debpython/files.py", line 53, in from_package
    raise Exception("cannot get content of %s" % package_name)
Exception: cannot get content of xdot

Going to think about this some more. We could just ignore failures or try to reflect the package state in Debian's APT. I am very reluctant to do anything with the latter since we will be reverse engineering APT at that point.

crdoconnor commented 2 years ago

Sorry I haven't tried out the dev branch yet, I was busy. I will though!

We could just ignore failures or try to reflect the package state in Debian's APT. I am very reluctant to do anything with the latter since we will be reverse engineering APT at that point.

It kind of feels like that's a rabbit hole you dove into when you first started this project :)

Caching the APT database as well probably would fix this but I have a feeling once you've dealt with that one another package problem will rear its head.

I was thinking about this last night and it actually kind of occurred to me that the cleanest approach might be to simply just cache all the downloaded deb files themselves and install from scratch each time without needing a network. This would slow things down quite a bit but it would be guaranteed to work.

I actually originally came looking for you when I had a CI pipeline I wanted to finish in a hurry and the apt-get step took 4 minutes because downloads were slow that day rather than the usual 45 seconds. If it was consistently 25-30 seconds I'd still call that a win.

awalsh128 commented 2 years ago

Yeah the whole endeavor is based an assumption that largely holds but not always as we have seen. apt-fast does a good job with download and installation if you are looking to optimize that process. As far as this action goes, I am going to lean towards keeping it at the cost of these cases like yours but we may still be able to find an approximation that could work. Understand if you want to just move forward.

crdoconnor commented 2 years ago

Hmm, @dev still seems to be netting me an exec format error when I tried it on a real workflow.

crdoconnor commented 2 years ago

apt-fast looks like something that will download packages faster by opening multiple connections, which is not quite what I was looking for. Moreover it would require an additional download step to install apt-fast.

I was more thinking of something that would "apt-get install [package] -d", which would download to /var/cache/apt/archives, then cache the /var/cache/apt/archives folder and then on restore, would do apt-get install, bypassing the download step.

crdoconnor commented 2 years ago

Ah, sorry there is some background to this I didnt give you. The packages are necessary for running multiarch containers. When they are installed it works when they are not it doesnt.

On Sat, 6 Aug 2022, 08:25 Andrew Walsh, @.***> wrote:

Sorry I didn't realize you set up that repo just to demonstrate the issue. Very helpful.

It looks like the OS is complaining that the uname binary is for a different arch. I am guessing this may be because I am not capturing the arch with the cached package information. It is probably caching a different arch version which causes the exec format error when the container calls uname.

I am not as familiar with podman and QEMU so would appreciate your input here. My initial guess is that it is caching a non-ARM version of one of the QEMU components. I will debug against the repo you provided in the meantime.

— Reply to this email directly, view it on GitHub https://github.com/awalsh128/cache-apt-pkgs-action/issues/44#issuecomment-1207166207, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOJKNIXJSEIPFXD3OTLQO3VXYHNTANCNFSM55XMXFCQ . You are receiving this because you authored the thread.Message ID: @.***>

awalsh128 commented 2 years ago

Hey, sorry I haven't responded for awhile. I should be able to take a look this week. Thanks for sharing your results.

awalsh128 commented 2 years ago

It doesn't seem to be picking up on the presence of the postinst script in the package when in the restore phase. Probably an incorrect evaluation. Will need to look further.

awalsh128 commented 2 years ago

I was able to debug some of the install script behavior to work correctly but the regression is still failing. The execute script feature is in dev. Let me know if you see the same results on your side.

uses: awalsh128/cache-apt-pkgs-action@dev
with:
    packages: yourpackage
    version: 1
    execute_install_scripts: true

WARNING: The outputs are broken on the action. This shouldn't matter though if you don't use them.

awalsh128 commented 2 years ago

Here are my latest findings. TL;DR is that we may not have a solution from this action.

https://github.com/awalsh128/cache-apt-pkgs-action/issues/57#issuecomment-1321024283

awalsh128 commented 2 years ago

Published the latest version of the action. Going to close this out.

https://github.com/awalsh128/cache-apt-pkgs-action/releases/tag/v1.2.0