cloudfoundry / eirini

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend
Apache License 2.0
115 stars 30 forks source link

Fix race in bifrost List tests #48

Closed williammartin closed 5 years ago

williammartin commented 5 years ago

This is a tricksy race condition that occurs because the tests are currently not run in parallel by ginkgo in the eirini CI/contributor script. It also outlines that one of the tests wasn't testing expected behaviour at all.

If the tests are run in parallel the tests fail with things like:

-----------------------------
•! Panic [0.001 seconds]
Bifrost
/home/vagrant/go/src/code.cloudfoundry.org/eirini/bifrost/bifrost_test.go:21
  List
  /home/vagrant/go/src/code.cloudfoundry.org/eirini/bifrost/bifrost_test.go:108
    When listing running LRPs
    /home/vagrant/go/src/code.cloudfoundry.org/eirini/bifrost/bifrost_test.go:137
      should translate []LRPs to []DesiredLRPSchedulingInfo [Measurement]
      /home/vagrant/go/src/code.cloudfoundry.org/eirini/bifrost/bifrost_test.go:151

      Test Panicked
      runtime error: index out of range
      /snap/go/3417/src/runtime/panic.go:44

      Full Stack Trace
        /snap/go/3417/src/runtime/panic.go:522 +0x1b5
      code.cloudfoundry.org/eirini/bifrost_test.glob..func1.2.4.3()
        /home/vagrant/go/src/code.cloudfoundry.org/eirini/bifrost/bifrost_test.go:152 +0x9e9
      code.cloudfoundry.org/eirini/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc00006f560, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/vagrant/go/src/code.cloudfoundry.org/eirini/bifrost/bifrost_suite_test.go:12 +0xa3
      testing.tRunner(0xc000358b00, 0xb3b538)
        /snap/go/3417/src/testing/testing.go:865 +0xc0
      created by testing.(*T).Run
        /snap/go/3417/src/testing/testing.go:916 +0x35a

When the tests are run in serial, the Expect(err).ToNot(HaveOccurred()) test passes, but in actual fact, the lrp list is empty (unlike the context describes), because the lrps variable only gets set after the opiClient.ListReturns(lrps, nil) line. Then when the next test runs the lrps variable had been set and the test passed.

When run in parallel, there's no guarantee that lrps was actually set with 3 values, so the test panics.

This same problem can manifest in multiple ways.

I've also set a default value for lrps in the higher level BeforeEach, which is idiomatic, such that if you were to create new contexts, they wouldn't inherit polluted values. I moved the related context up to highlight the coupling (that the lrps is empty by default).

cfdreddbot commented 5 years ago

:white_check_mark: Hey williammartin! The commit authors and yourself have already signed the CLA.

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/165132052

The labels on this github issue will be updated when the story is started.

williammartin commented 5 years ago

Note also that before this PR, the tests would also probably fail if you ever turned on -randomizeAllSpecs even in serial.

gdankov commented 5 years ago

Hey @williammartin,

Nice catch! Thanks for the PR, looks good.

Best regards, @gdankov & @JulzDiverse