containers / automation_images

Apache License 2.0
21 stars 17 forks source link

Base-image build ssh-disconnect #243

Closed cevich closed 1 year ago

cevich commented 1 year ago

While building a new prior-fedora (F36) base-image, it appears as if ssh is disconnecting during a package update:

...cut...
==> prior-fedora: Using SSH communicator to connect: 127.0.0.1
==> prior-fedora: Waiting for SSH to become available...
==> prior-fedora: Connected to SSH!
==> prior-fedora: Pausing 10s before connecting...
==> prior-fedora: Waiting for SSH to become available...
==> prior-fedora: Connected to SSH!
==> prior-fedora: Provisioning with shell script: /tmp/packer-shell3026262523
==> prior-fedora: Uploading /tmp/cirrus-ci-build/ => /tmp/automation_images/
==> prior-fedora: Provisioning with shell script: /tmp/packer-shell1652337550
    prior-fedora: Disabling periodic services that could destabilize automation:
    prior-fedora: Banishing cron (ignoring errors)
    prior-fedora: Banishing crond (ignoring errors)
    prior-fedora: Banishing atd (ignoring errors)
    prior-fedora: Banishing apt-daily-upgrade (ignoring errors)
    prior-fedora: Banishing apt-daily (ignoring errors)
    prior-fedora: Banishing fstrim (ignoring errors)
    prior-fedora: Banishing motd-news (ignoring errors)
    prior-fedora: Banishing systemd-tmpfiles-clean (ignoring errors)
    prior-fedora: Banishing update-notifier-download (ignoring errors)
    prior-fedora: Banishing mlocate-updatedb (ignoring errors)
    prior-fedora: Warning: Automation library not found. Assuming it's not yet installed
    prior-fedora: Fedora 36 - x86_64                               18 MB/s |  81 MB     00:04
    prior-fedora: Fedora 36 openh264 (From Cisco) - x86_64        2.0 kB/s | 2.5 kB     00:01
    prior-fedora: Fedora Modular 36 - x86_64                      2.9 MB/s | 2.4 MB     00:00
    prior-fedora: Fedora 36 - x86_64 - Updates                     16 MB/s |  31 MB     00:01
==> prior-fedora: Provisioning step had errors: Running the cleanup provisioner, if present...
==> prior-fedora: Deleting output directory...
Build 'prior-fedora' errored after 2 minutes 25 seconds: Script disconnected unexpectedly. If you expected your script to disconnect, i.e. from a restart, you can try adding `"expect_disconnect": true` or `"valid_exit_codes": [0, 2300218]` to the shell provisioner parameters.
==> Wait completed after 2 minutes 25 seconds
==> Some builds didn't complete successfully and had errors:
--> prior-fedora: Script disconnected unexpectedly. If you expected your script to disconnect, i.e. from a restart, you can try adding `"expect_disconnect": true` or `"valid_exit_codes": [0, 2300218]` to the shell provisioner parameters.
==> Builds finished but no artifacts were created.
make: *** [Makefile:354: base_images/manifest.json] Error 1
+ clear_cred_files
+ set +ex
Exit status: 2

Comparing the output to the same script running to produce a container image, shows the following:

Fedora 36 - x86_64                              8.1 MB/s |  81 MB     00:09    
Fedora 36 openh264 (From Cisco) - x86_64        3.1 kB/s | 2.5 kB     00:00    
Fedora Modular 36 - x86_64                      3.2 MB/s | 2.4 MB     00:00    
Fedora 36 - x86_64 - Updates                     13 MB/s |  31 MB     00:02    
Fedora Modular 36 - x86_64 - Updates            944 kB/s | 2.9 MB     00:03    
Dependencies resolved.
================================================================================
 Package                            Arch    Version              Repo      Size
================================================================================
Upgrading:
 curl                               x86_64  7.82.0-12.fc36       updates  308 k
 elfutils-default-yama-scope        noarch  0.188-3.fc36         updates   15 k
 elfutils-libelf                    x86_64  0.188-3.fc36         updates  196 k
 elfutils-libs                      x86_64  0.188-3.fc36         updates  257 k
 fedora-release-common              noarch  36-21                updates   20 k
 fedora-release-container           noarch  36-21                updates   10 k
 fedora-release-identity-container  noarch  36-21                updates   11 k
 libcurl                            x86_64  7.82.0-12.fc36       updates  301 k
 librepo                            x86_64  1.15.1-1.fc36        updates   96 k
 libtasn1                           x86_64  4.19.0-1.fc36        updates   75 k
 libxcrypt                          x86_64  4.4.33-4.fc36        updates  120 k
 lua-libs                           x86_64  5.4.4-7.fc36         updates  131 k
 python3                            x86_64  3.10.9-1.fc36        updates   28 k
 python3-libs                       x86_64  3.10.9-1.fc36        updates  7.4 M
 systemd-libs                       x86_64  250.9-1.fc36         updates  614 k
 tpm2-tss                           x86_64  3.2.1-1.fc36         updates  599 k
 tzdata                             noarch  2022g-1.fc36         updates  428 k
 vim-data                           noarch  2:9.0.1054-1.fc36    updates   24 k
 vim-minimal                        x86_64  2:9.0.1054-1.fc36    updates  782 k
...cut...
  Running scriptlet: elfutils-default-yama-scope-0.188-3.fc36.noarch      11/38 
...cut...
  Running scriptlet: tpm2-tss-3.2.1-1.fc36.x86_64                         16/38 
...cut...
  Running scriptlet: libtasn1-4.18.0-2.fc36.x86_64                        38/38 

Since the output buffer from the VM is lost, it's hard to tell if any of that is the root cause. A subsequent package install is much more extensive, and could also be to blame.

cevich commented 1 year ago

Problem seems to occur even with the package updates disabled. This suggests the problem is something during package install, or something else entirely :confounded:

cevich commented 1 year ago

This seems to be related: https://bugzilla.redhat.com/show_bug.cgi?id=1907030

cevich commented 1 year ago

Confirmed: Disabling the updates repo. avoids the OOM killer. Testing workaround in #244