coreos / pkg

a collection of go utility packages
Apache License 2.0
188 stars 70 forks source link

github.com/coreos/pkg/progressutil's TestCopyOne is flaky #76

Open mwhudson opened 8 years ago

mwhudson commented 8 years ago

I ran go test -c github.com/coreos/pkg/progressutil and then ran ./progressutil.test -test.run TestCopyOne in a loop and it failed 7 times out of 100 on a lightly loaded system. It failed 47 times out of 100 when my system was more loaded.

jonboulle commented 8 years ago

@dgonyeo PTAL?

ramcq commented 6 years ago

Ping... this still seems to be broken. Checking the output of the progress bar seems very fragile.

cgonyeo commented 6 years ago

Looking over the test implementation I'm fairly certain this is an issue in the test, not the library. The test fires up a PrintAndWait in a new goroutine, and then attempts to check it on the same interval it's printing. If there's heavy system load, the goroutine scheduler won't get as much CPU time and then the checking logic isn't running in lockstep with the printing logic any more. In hindsight it's not a particularly well written test.

@ramcq is this test flake causing issues for you? I can try to rewrite the test to be less fragile if needed, but I haven't touched this code in almost two years now.

ramcq commented 6 years ago

@dgonyeo Thanks for taking a look - I've just been importing some of the Debian go packages into Endless OS and I found that this one was failing in our build server which is more likely to be contended on CPU/IO depending what else is on the go. Debian usually runs into all of these types of problems across their buildd network on one architecture or other, but because there is no binary (just .go files) here, it's not ever been rebuilt there so I think you got away with it. ;) I've stubbed the expected output part of the test on our side which probably renders it broadly pointless... but lets the package build and publish in our repo. It's totally up to you if you want to bother rewriting it to fix. :)

cgonyeo commented 6 years ago

I'll let this one sit then. If anyone else is impacted by this beyond "we trivially worked around it" let me know and I can try to fix it.