chainguard-dev / melange

build APKs from source code
Apache License 2.0
429 stars 110 forks source link

bad packages/x86_64/APKINDEX can cause hang #1645

Open smoser opened 1 week ago

smoser commented 1 week ago

This is quite possibly in the realm of user error or "don't do that".

I got my wolfi-dev/os tree into a state that would not build packages.

Here should be enough information to recreate.

Note: ncurses does depend on itself.

$ melange version | grep ^[A-Z]
GitVersion:    v0.15.7
GitCommit:     997f9fd699767f784c2879272b12546cbdb709cc
GitTreeState:  clean
BuildDate:     '2024-11-14T01:52:15Z'
GoVersion:     go1.23.3
Compiler:      gc
Platform:      linux/amd64

$ git log HEAD^.. --oneline  --no-decorate 
2e01d074b py3-flask/3.1.0 package update (#34105)

$ make clean
$ make package/ncurses
# ... happily builds ...
...
2024/11/14 10:18:53 INFO wrote packages/x86_64/ncurses-terminfo-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO generating apk index from packages in packages/x86_64
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-doc-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-dev-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-terminfo-base-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-static-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-terminfo-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO updating index at packages/x86_64/APKINDEX.tar.gz with new packages: [ncurses-6.5_p20241006-r4 ncurses-static-6.5_p20241006-r4 ncurses-dev-6.5_p20241006-r4 ncurses-doc-6.5_p20241006-r4 ncurses-terminfo-base-6.5_p20241006-r4 ncurses-terminfo-6.5_p20241006-r4]
2024/11/14 10:18:53 INFO signing apk index at packages/x86_64/APKINDEX.tar.gz
2024/11/14 10:18:53 INFO signing index packages/x86_64/APKINDEX.tar.gz with key local-melange.rsa
2024/11/14 10:18:53 INFO appending signature RSA to index packages/x86_64/APKINDEX.tar.gz
2024/11/14 10:18:53 INFO writing signed index to packages/x86_64/APKINDEX.tar.gz
...

# now break the APKINDEX with 'rm' rather than 'make clean'
# probably because i wanted to build again after a change.
$ rm packages/x86_64/ncurses-*

# now try again to build
$ make package/ncurses
...
2024/11/14 10:20:56 INFO installing build-base (1-r8)
2024/11/14 10:20:56 INFO installing libcrypt1 (2.40-r3)
2024/11/14 10:20:56 INFO installing busybox (1.37.0-r0)
<hang here forever>
<give up, hit ctrl-c>
2024/11/14 10:21:35 INFO deleting guest dir /home/user/tmp/melange-guest-3021240161
2024/11/14 10:21:35 INFO deleting workspace dir /home/user/tmp/melange-workspace-2950730389
2024/11/14 10:21:35 ERRO failed to build package: unable to build guest: 
   unable to generate image: 
     installing apk packages: 
       installing packages: 
         expanding ncurses-terminfo-base (ver:6.5_p20241006-r4 arch:x86_64): 
         fetching package "ncurses-terminfo-base": 
            failed to read repository package apk /home/user/src/wolfi-os/packages/x86_64/ncurses-terminfo-base-6.5_p20241006-r4.apk:
              open /home/user/src/wolfi-os/packages/x86_64/ncurses-terminfo-base-6.5_p20241006-r4.apk: 
                  no such file or directory: context canceled
make[1]: *** [Makefile:125: packages/x86_64/ncurses-6.5_p20241006-r4.apk] Error 1

I prettied-up the output a bit of the final ERROR message (after the ctrl-c). It actually does give a reasonable error, but I dont' know that I've ever read error messages of a program after hitting ctrl-c, so I feel justified in my lost time debugging why it was hung.

smoser commented 1 week ago

OK. Here is a simple recreate that throws out the red-herring of a self-dependent package or the bootstrap archive.

Then from the wolfi-dev/os tree (commit listed above)

$ rm -Rf ~/.cache/dev.chainguard.go-apk/ 
$ make clean
$ make package/test-me-dep
$ rm packages/x86_64/test-me-dep-1.0-r0.apk
$ make package/test-me
melange build test-me.yaml --repository-append /home/user/src/wolfi-os/packages 
   --keyring-append local-melange.rsa.pub --signing-key local-melange.rsa 
   --arch x86_64 --env-file build-x86_64.env --namespace wolfi 
   --license 'Apache-2.0' 
   --git-repo-url 'https://github.com/wolfi-dev/os' 
   --generate-index false  
   --pipeline-dir ./pipelines/  
   -k https://packages.wolfi.dev/os/wolfi-signing.rsa.pub 
   -r https://packages.wolfi.dev/os
2024/11/14 10:37:19 INFO git commit for build config not provided, attempting to detect automatically
2024/11/14 10:37:19 WARN SOURCE_DATE_EPOCH is specified but empty, setting it to 1969-12-31 19:00:00 -0500 EST
2024/11/14 10:37:19 INFO melange is building:
2024/11/14 10:37:19 INFO   configuration file: test-me.yaml
2024/11/14 10:37:19 INFO   workspace dir: /home/user/tmp/melange-workspace-496249841
2024/11/14 10:37:19 INFO evaluating pipelines for package requirements
2024/11/14 10:37:19 INFO --cache-dir ./melange-cache/ not a dir; skipping
2024/11/14 10:37:19 INFO populating workspace /home/user/tmp/melange-workspace-496249841 from ./test-me/
2024/11/14 10:37:19 INFO building workspace in '/home/user/tmp/melange-guest-3256530904' with apko
2024/11/14 10:37:19 INFO setting apk repositories: [/home/user/src/wolfi-os/packages https://packages.wolfi.dev/os]
2024/11/14 10:37:19 INFO image configuration:
2024/11/14 10:37:19 INFO   contents:
2024/11/14 10:37:19 INFO     build repositories: []
2024/11/14 10:37:19 INFO     runtime repositories: []
2024/11/14 10:37:19 INFO     keyring:      []
2024/11/14 10:37:19 INFO     packages:     [busybox test-me-dep]
2024/11/14 10:37:19 INFO   accounts:
2024/11/14 10:37:19 INFO     runas:  
2024/11/14 10:37:19 INFO     users:
2024/11/14 10:37:19 INFO       - uid=1000(build) gid=1000
2024/11/14 10:37:19 INFO     groups:
2024/11/14 10:37:19 INFO       - gid=1000(build) members=[build]
2024/11/14 10:37:19 INFO auth configured for: []
2024/11/14 10:37:19 INFO installing ca-certificates-bundle (20241010-r2)
2024/11/14 10:37:19 INFO installing wolfi-baselayout (20230201-r15)
2024/11/14 10:37:19 INFO installing glibc (2.40-r3)
2024/11/14 10:37:19 INFO installing ld-linux (2.40-r3)
2024/11/14 10:37:19 INFO installing libgcc (14.2.0-r5)
2024/11/14 10:37:19 INFO installing glibc-locale-posix (2.40-r3)
2024/11/14 10:37:19 INFO installing libxcrypt (4.4.36-r8)
2024/11/14 10:37:19 INFO installing libcrypt1 (2.40-r3)
2024/11/14 10:37:19 INFO installing busybox (1.37.0-r0)

<hang here>
^C
2024/11/14 10:38:42 INFO deleting guest dir /home/user/tmp/melange-guest-3256530904
2024/11/14 10:38:42 INFO deleting workspace dir /home/user/tmp/melange-workspace-496249841
2024/11/14 10:38:42 ERRO failed to build package: 
  unable to build guest: unable to generate image: 
    installing apk packages: installing packages: 
      expanding test-me-dep (ver:1.0-r0 arch:x86_64):
        fetching package "test-me-dep": 
          failed to read repository package apk /home/user/src/wolfi-os/packages/x86_64/test-me-dep-1.0-r0.apk: 
            open /home/user/src/wolfi-os/packages/x86_64/test-me-dep-1.0-r0.apk: 
              no such file or directory: context canceled
make[1]: *** [Makefile:125: packages/x86_64/test-me-1.0-r0.apk] Error 1
make: *** [Makefile:115: package/test-me] Interrupt
smoser commented 1 week ago

I think the problem is https://github.com/chainguard-dev/apko/blob/b93f0a2bc55f4dc8a07c373fc338e37dec193a24/pkg/apk/apk/implementation.go#L703 . On error of expandPackage, no close is ever done and thus no signal communicated.