FiloSottile / age

A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.
https://age-encryption.org
BSD 3-Clause "New" or "Revised" License
15.75k stars 475 forks source link

age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe #517

Open someplaceguy opened 11 months ago

someplaceguy commented 11 months ago

Environment

What were you trying to do

Trying to run tests as part of the package's build process.

What happened

A test failed.

building
Building subPackage .
Building subPackage ./agessh
Building subPackage ./armor
Building subPackage ./cmd/age
Building subPackage ./cmd/age-keygen
Building subPackage ./internal/bech32
Building subPackage ./internal/format
Building subPackage ./internal/plugin
Building subPackage ./internal/stream
running tests
ok      filippo.io/age  0.736s
ok      filippo.io/age/agessh   0.272s
ok      filippo.io/age/armor    0.016s
--- FAIL: TestScript (0.00s)
    --- FAIL: TestScript/plugin (0.06s)
        testscript.go:481: 
            # encrypt and decrypt a file with a test plugin (0.059s)
            > age -r age1test10qdmzv9q -o test.age input
            [stderr]
            age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe
            age: report unexpected or unhelpful errors at https://filippo.io/age/report
            [exit status 1]
            FAIL: testdata/plugin.txt:2: unexpected command failure

FAIL
FAIL    filippo.io/age/cmd/age  1.008s
FAIL

The build machine's CPU may have been overloaded at the time the tests were running (e.g. load average up to ~150 on a 16-core CPU), but it should not have caused an error AFAIK.

ctheune commented 11 months ago

This happens repeatably (without any load issue) for me on Darwin. NixOS is on unstable.

Darwin chazzzzz 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:52 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T8103 arm64

Here's the full build output.

unpacking sources
unpacking source archive /nix/store/fxs7r50g1c64gbvnz76pc4g0kcclapaj-source
source root is source
patching sources
applying patch /nix/store/16jna1lazm41ff0rdx6dsw1h8fpwgzm2-0001-revert-temporarily-disabled-testscript-tests.patch
patching file cmd/age/age_test.go
patching file go.mod
patching file go.sum
applying patch /nix/store/6zkzhrsr4scj1dxr5igd6c513x8i3a49-0002-depend-on-c2sp_cctv_age__TestVectors.patch
patching file go.mod
patching file go.sum
patching file testkit_test.go
configuring
building
Building subPackage .
Building subPackage ./agessh
Building subPackage ./armor
Building subPackage ./cmd/age
Building subPackage ./cmd/age-keygen
Building subPackage ./internal/bech32
Building subPackage ./internal/format
Building subPackage ./internal/plugin
Building subPackage ./internal/stream
running tests
ok      filippo.io/age  1.100s
ok      filippo.io/age/agessh   0.805s
ok      filippo.io/age/armor    0.490s
--- FAIL: TestScript (0.00s)
    --- FAIL: TestScript/plugin (0.08s)
        testscript.go:481:
            # encrypt and decrypt a file with a test plugin (0.065s)
            > age -r age1test10qdmzv9q -o test.age input
            [stderr]
            age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe
            age: report unexpected or unhelpful errors at https://filippo.io/age/report
            [exit status 1]
            FAIL: testdata/plugin.txt:2: unexpected command failure

FAIL
FAIL    filippo.io/age/cmd/age  2.166s
FAIL

I'm currently trying to figure out when this broke.

ctheune commented 11 months ago

@someplaceguy This /could/ also be a NixOS issue. Did you happen to already open a bug there?

ctheune commented 11 months ago

I just double-checked with the NixOS repo and our package definition hasn't changed in 8 months and has been on 1.1.1 since then: https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/security/age/default.nix

ctheune commented 11 months ago

It does look a bit like /something/ changed on our side: https://hydra.nixos.org/log/3p7vyypyrsbys1r4wa8zv2h9ma3cjhhd-age-1.1.1.drv

someplaceguy commented 11 months ago

@ctheune No, I run into these kind of non-deterministic test failures quite frequently (a few per week) due to the aforementioned CPU load issue. Apparently high CPU loads tend to be very good at finding bugs in software or even tests themselves.

All of these issues that I've ran into have been in the upstream software, never in NixOS.

But almost all of them are either test failures due to timeouts or race conditions in build systems (especially Makefiles), which I don't even bother open bugs for due to these being so common (I just disable tests or disable parallel building).

I've chosen to file this bug because it didn't seem to be either of these issues, but from superficially looking at the error, I would be surprised if the problem was in NixOS.

You don't seem to have a high CPU load and you also seem to be using NixOS, so perhaps this issue is different, I don't know.

ctheune commented 11 months ago

Yeah, could be. The issue seems to trigger upstream as well and it has been happening for me since yesterday and it happens repeatedly.

Here's the build that broke upstream. https://hydra.nixos.org/build/229581170

The changes that triggered the build are here: https://hydra.nixos.org/api/scmdiff?branch=&rev2=9462344318b376e157c94fa60c20a25b913b2381&type=git&rev1=05f42aef4e9502a0d0ad79c9debd11d98565cfa7&uri=https%3A%2F%2Fgithub.com%2FNixOS%2Fnixpkgs.git

I checked the changes but none of the commit message seem suspicuous.

I'm going to disable the tests and see whether it actually works.

BTW: the aarch-build didn't break and the pure linux builds and I'm not quite sure whether I'm running my nix environment on MacOS with x86_64 and thus emulation ... o_O

someplaceguy commented 11 months ago

I forgot to mention that once I rebuilt age, the issue disappeared for me.

But you seem to be running into this issue all the time, which added to this issue having 3 NixOS occurrences and no other occurrences in other OSes is quite puzzling indeed.

Perhaps an upgraded dependency is causing the problem? I really know almost nothing about age or its tests, and I don't really have much time to dive into it, so perhaps one of the developers can give more insight.

ctheune commented 11 months ago

So, it seems I ended up with a working version of age when disabling the tests, but the tests keep failing reliably for me.

Maybe pinging @bdd and @zowoq who touched it last could help.

FiloSottile commented 11 months ago

Hi! First of all, thank you for your work packaging age, and for reporting things upstream.

What that test does (under a layer of abstraction in testscript) is make a copy of the test binary into a temporary directory that's added to $PATH, execute the binary with different flags that make it behave as a plugin, and communicate with it over stdin/stdout. Does any of that sound like something that would clash with the NixOS env?

someplaceguy commented 11 months ago

Does any of that sound like something that would clash with the NixOS env?

Theoretically it could, as NixOS runs package builds and tests under a sandbox, but according to the information you provided it seems unlikely to me (although it's hard to tell).

execute the binary with different flags that make it behave as a plugin, and communicate with it over stdin/stdout.

Again, I know almost nothing about age, but looking at this error message:

age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe

... at first sight, if I'm interpreting the error correctly, it kinda appears that the test plugin exited prematurely because it was trying to write to its stdout (hence the write mention) but age must have closed its end of the pipe (hence the broken pipe error), which is why the write didn't succeed but instead aborted the execution of the test plugin.

It'd be interesting to figure out for what reasons age could be closing its end of the pipe, if that's the case.

someplaceguy commented 11 months ago

Depending on the code, another possibility is that the write and broken pipe errors really are coming from age, which would just mean that the test plugin is exiting prematurely (or at least, closing its end of the pipe) for some reason, even though the age parent process could be working fine.

bdd commented 11 months ago

I cannot reproduce this on {x86_64, aarch64}-linux or aarch64-darwin. Neither from the trunk, nor from nixpkgs-23.05(-darwin) branches.

Looking at Hydra, this seems to fail more frequently on x86_64-darwin, rarely for aarch64-darwin, and seemingly never for Linux (x86_64, aarch64).

Sounds like @ctheune can repro on their aarch64-darwin box. I changed the derivation a bit (only running the suspect test) and patched age test suite to set AGEDEBUG=plugin envvar for the testscript runnner, so maybe we can catch how it fails. Error in Hydra builds is bubbling up from age/age.go:238 when it's trying to wrap a file key for recipient, talking to the plugin. This happens over a pipe where plugin's input is stdin, and output is stdout. In the test case, this plugin is simply implemented to return the key it received, without wrapping (not that it matters); and this happens over a pipe. Maybe for some reason age-test-plugin process used in the tests is dying. I'm hoping to see a bit more with the patch.

Can you please try:

nix build --print-build-logs --no-link github:bdd/nixpkgs/age-test-plugin#age

...and paste the output of the failure?

If it doesn't fail in the first case, you can try rebuilding by adding --rebuild to the nix build command above.

@someplaceguy Can you try the same? I'm even more interested in seeing how this fails under Linux (as you said you're running NixOS 23.05).

someplaceguy commented 11 months ago

@bdd I tried both with and without your commit but I can't reproduce this issue.

ctheune commented 11 months ago

@bdd Thanks for investigating! See my output below.

@FiloSottile yeah, the mysterious part is that it used to work and age didn't change. I was wondering whether a dependency change in Go-land might be triggering this.

age> unpacking sources
age> unpacking source archive /nix/store/fxs7r50g1c64gbvnz76pc4g0kcclapaj-source
age> source root is source
age> patching sources
age> applying patch /nix/store/16jna1lazm41ff0rdx6dsw1h8fpwgzm2-0001-revert-temporarily-disabled-testscript-tests.patch
age> patching file cmd/age/age_test.go
age> patching file go.mod
age> patching file go.sum
age> applying patch /nix/store/6zkzhrsr4scj1dxr5igd6c513x8i3a49-0002-depend-on-c2sp_cctv_age__TestVectors.patch
age> patching file go.mod
age> patching file go.sum
age> patching file testkit_test.go
age> applying patch /nix/store/27p8sx5v262x6dr93l94219ibkj3cclw-test-agedebug-plugin.patch
age> patching file cmd/age/testdata/plugin.txt
age> configuring
age> building
age> Building subPackage .
age> Building subPackage ./agessh
age> Building subPackage ./armor
age> Building subPackage ./cmd/age
age> Building subPackage ./cmd/age-keygen
age> Building subPackage ./internal/bech32
age> Building subPackage ./internal/format
age> Building subPackage ./internal/plugin
age> Building subPackage ./internal/stream
age> running tests
age> testing: warning: no tests to run
age> PASS
age> ok         filippo.io/age  0.498s [no tests to run]
age> testing: warning: no tests to run
age> PASS
age> ok         filippo.io/age/agessh   0.464s [no tests to run]
age> testing: warning: no tests to run
age> PASS
age> ok         filippo.io/age/armor    0.442s [no tests to run]
age> === RUN   TestScript
age> === RUN   TestScript/plugin
age> === PAUSE TestScript/plugin
age> === CONT  TestScript/plugin
age>     testscript.go:481:
age>         WORK=$WORK
age>         PATH=/private/tmp/nix-build-age-1.1.1.drv-0/testscript-main4004674410/bin:/nix/store/drw07a5dnh7xyxyh9iq1k4qlhnz80qf8-go-1.20.6/share/go/bin:/nix/store/drw07a5dnh7xyxyh9iq1k4qlhnz80qf8-go-1.20.6/bin:/nix/store/v3b9q8mi1ybz7bl75w1p6p3w6aw4vfyr-clang-wrapper-11.1.0/bin:/nix/store/5j3zwxjziy6hcpaf9wlcj7qqq0w78wfi-clang-11.1.0/bin:/nix/store/vjjplvzy60qz3nznrff6s8zvb80m3bif-coreutils-9.3/bin:/nix/store/xfy2lmwhipxbpr0zxpyhdnlkbnr388bg-cctools-binutils-darwin-wrapper-11.1.0-973.0.1/bin:/nix/store/snxjdcdka4p7af454n3v7mdc4yh1lywn-cctools-binutils-darwin-11.1.0-973.0.1/bin:/nix/store/vjjplvzy60qz3nznrff6s8zvb80m3bif-coreutils-9.3/bin:/nix/store/k5abm4s2pgs0xg64lligwfprzafg8cdf-findutils-4.9.0/bin:/nix/store/hmscngdly85w04lb5cn32jcfqyg61kfq-diffutils-3.10/bin:/nix/store/3c0v61mw26d766jfdqg2bqmb32pcl40n-gnused-4.9/bin:/nix/store/hdq4m97ma7rk9srn8f0myfhhlh2inzm5-gnugrep-3.11/bin:/nix/store/8cmmw5frzhs5nzpkvfhdih6d7b3g7zqw-gawk-5.2.2/bin:/nix/store/q2csqxdlnjinz6fsvg751wryhxbnk9sw-gnutar-1.34/bin:/nix/store/397iphc81akvp54z9lsrfqly4fj56d08-gzip-1.12/bin:/nix/store/p742p1s409j1yvb0ywvk258fjipzsqd9-bzip2-1.0.8-bin/bin:/nix/store/ch57psmgq8rg5f0i0cvp07d0zfndjf8x-gnumake-4.4.1/bin:/nix/store/xx5s0rkx3aav3f850ig4afadg6ms7lq9-bash-5.2-p15/bin:/nix/store/psxc626ll2zkjkalyybkyzcxvkgr6nbx-patch-2.7.6/bin:/nix/store/5ym0a90mnlw4gi81wpraa370wi32r075-xz-5.4.3-bin/bin:/nix/store/rxr4dqbh16y2v2c7fchcnlsfqkdr8bw9-file-5.44/bin
age>         GOTRACEBACK=system
age>         HOME=/no-home
age>         TMPDIR=$WORK/.tmp
age>         devnull=/dev/null
age>         /=/
age>         :=:
age>         $=$
age>         TESTSCRIPT_COVER_DIR=
age>         exe=
age>         > env AGEDEBUG=plugin
age>         # encrypt and decrypt a file with a test plugin (0.686s)
age>         > age -r age1test10qdmzv9q -o test.age input
age>         [stderr]
age>         -> add-recipient age1test10qdmzv9q
age>         -> wrap-file-key
age>         HL9K3fpwqlOHvr2sAMWCTg
age>         -> done
age>         -> recipient-stanza 0 test
age>         HL9K3fpwqlOHvr2sAMWCTg
age>         -> done
age>         age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe
age>         age: report unexpected or unhelpful errors at https://filippo.io/age/report
age>         [exit status 1]
age>         FAIL: testdata/plugin.txt:4: unexpected command failure
age> --- FAIL: TestScript (0.00s)
age>     --- FAIL: TestScript/plugin (0.69s)
age> FAIL
age> FAIL       filippo.io/age/cmd/age  1.176s
age> FAIL
error: builder for '/nix/store/54krlsbx5b05x8ly3b7q37y5k24jhza1-age-1.1.1.drv' failed with exit code 1
squirmy commented 10 months ago

One extra data point here. It works fine for me on aarch64-darwin. It does not work for me inside GitHub actions on x86_64-darwin.

So, it seems I ended up with a working version of age when disabling the tests, but the tests keep failing reliably for me.

@ctheune would you mind sharing your solution for disabling the tests?

I've tried:

ageWithoutTests = pkgs.age.overrideAttrs (_: {
  doCheck = false;
});

But this doesn't seem to work around the issue for me.

ctheune commented 9 months ago

Sure. I'm using

    (age.overrideAttrs (finalAttrs: previousAttrs: {
      doCheck = false; # https://github.com/FiloSottile/age/issues/517
    }))