🐛 [Bug]: Isolation Issue with Parallel Subtests

sixcolors commented 8 months ago

An issue that appears throughout the tests. Running tests in parallel using the t.Parallel() function can show problems when the tests that are not isolated. When tests share a state, such as a common instance of app, running them in parallel can lead to race conditions or other unexpected behavior leading to unpredictable results.

To fix this issue [with false results in tests], we could refactor the tests and make sure that each test has its isolated state.

If we want to eliminate repetitive code for setup and teardown for each parallel test, we could consider using the testify/suite package.

For example I ran into issues with the Test_Client_UserAgent.

Example, Test_Client_UserAgent original:

func Test_Client_UserAgent(t *testing.T) {
    t.Parallel()

    ln := fasthttputil.NewInmemoryListener()

    app := New()

    app.Get("/", func(c Ctx) error {
        return c.Send(c.Request().Header.UserAgent())
    })

    go func() {
        require.NoError(t, app.Listener(ln, ListenConfig{
            DisableStartupMessage: true,
        }))
    }()

    t.Run("default", func(t *testing.T) {
        t.Parallel()
        for i := 0; i < 5; i++ {
            a := Get("http://example.com")

            a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

            code, body, errs := a.String()

            require.Equal(t, StatusOK, code)
            require.Equal(t, defaultUserAgent, body)
            require.Empty(t, errs)
        }
    })

    t.Run("custom", func(t *testing.T) {
        t.Parallel()
        for i := 0; i < 5; i++ {
            c := AcquireClient()
            c.UserAgent = "ua"

            a := c.Get("http://example.com")

            a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

            code, body, errs := a.String()

            require.Equal(t, StatusOK, code)
            require.Equal(t, "ua", body)
            require.Empty(t, errs)
            ReleaseClient(c)
        }
    })
}

Originally posted by @sixcolors in https://github.com/gofiber/fiber/issues/2864#issuecomment-1957836265

The issues found in three test files, namely app_test.go, client_test.go, and ctx_test.go.

efectn commented 8 months ago

I've noticed a common issue that appears throughout the tests. Running tests in parallel using the t.Parallel() function can cause problems if the tests are not isolated correctly. When tests share a state, such as a common instance of app, running them in parallel can lead to race conditions. This means that one test could modify the state while another is reading it, leading to unpredictable results.

To fix this issue, we should refactor the tests and make sure that each test has its isolated state.

If we want to eliminate repetitive code for setup and teardown for each parallel test, we could consider using the testify/suite package.

For example, in the Test_Ctx_Parsers noted above, I also ran into issues with the Test_Client_UserAgent.

Example, Test_Client_UserAgent with a fix:
func Test_Client_UserAgent(t *testing.T) {
  t.Parallel()

  setupApp := func(t *testing.T) *fasthttputil.InmemoryListener {
      t.Helper()
      ln := fasthttputil.NewInmemoryListener()
      app := New()
      app.Get("/", func(c Ctx) error {
          return c.Send(c.Request().Header.UserAgent())
      })
      go func() {
          require.NoError(t, app.Listener(ln, ListenConfig{
              DisableStartupMessage: true,
          }))
      }()
      time.Sleep(100 * time.Millisecond) // wait for server to start
      return ln
  }

  t.Run("default", func(t *testing.T) {
      t.Parallel()
      ln := setupApp(t)
      for i := 0; i < 5; i++ {
          a := Get("http://example.com")

          a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

          code, body, errs := a.String()

          require.Equal(t, StatusOK, code)
          require.Equal(t, defaultUserAgent, body)
          require.Empty(t, errs)
      }
  })

  t.Run("custom", func(t *testing.T) {
      t.Parallel()
      ln := setupApp(t)
      for i := 0; i < 5; i++ {
          c := AcquireClient()
          c.UserAgent = "ua"

          a := c.Get("http://example.com")

          a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

          code, body, errs := a.String()

          require.Equal(t, StatusOK, code)
          require.Equal(t, "ua", body)
          require.Empty(t, errs)
          ReleaseClient(c)
      }
  })
}
Please let me know if you have any questions or concerns.

Originally posted by @sixcolors in #2864 (comment)

There are some issues with a particular pattern found in three test files, namely app_test.go, client_test.go, and ctx_test.go. However, it seems that earlydata_test.go is doing it correctly by using t.Helper(). Additionally, I noticed that the subtests in etag_test.go, session_test.go, and store_test.go are also using the subtest pattern, but they are not using shared instances as far as I can tell from a quick scan.

We should also detect the shared stuff between tests. Something may be wrong or just false positive.

Btw i agree about setupApp. Maybe we can create something like helper to reduce code duolication on tests/benchmarks

sixcolors commented 8 months ago

A similar problem was just noticed in internal/storage/memory/memory_test.go where var testStore = new is set globally in the package, and I got memory corruption issues, that randomly occurred. Since this pkg is using sync and a mutex, we should probably investigate the cause.

ReneWerner87 commented 8 months ago

we should investigate these cases and, in my opinion, not simply fix them through isolation

because normally it is a valid setup if a server is up and running and processing multiple simultaneous requests, if something is shared there, then we should fix the problem rather than the test

ReneWerner87 commented 8 months ago

https://github.com/gofiber/fiber/blob/v2/ctx_test.go#L1376-L1467 there must be a bug somewhere in the merge branch or v3, the same test works perfectly in v2 and also in parallel

ReneWerner87 commented 8 months ago

we found the problem, The fasthttp ctx is filled when the struct is created for the pool and is set to nil when it is packed back into the pool

If you take it out of the pool again it's bad because it was only filled when the pool item was created and not, as with v2, when the item was configured from the pool

Will fix it tomorrow or the day after, so not isolating it has shown us a real world bug which also occurs in a real application when ctx is not completely recreated, but is reused from the pool

ReneWerner87 commented 8 months ago

v2

https://github.com/gofiber/fiber/blob/ddc6b231f816f197a5f1d73825eaf0c7d55b08f4/ctx.go#L181 https://github.com/gofiber/fiber/blob/ddc6b231f816f197a5f1d73825eaf0c7d55b08f4/ctx.go#L193

v3

https://github.com/gofiber/fiber/blob/26346d6908778b872a69b0ee8c4509e1e8277047/ctx_interface.go#L446-L452 https://github.com/gofiber/fiber/blob/26346d6908778b872a69b0ee8c4509e1e8277047/ctx_interface.go#L455-L458 https://github.com/gofiber/fiber/blob/26346d6908778b872a69b0ee8c4509e1e8277047/ctx_interface.go#L489

The problem here is that the fasthttp context is only set when new items are created for the pool, but not when items from the pool are reused, so we would have to move this to AcquireCtx https://github.com/gofiber/fiber/blob/26346d6908778b872a69b0ee8c4509e1e8277047/app.go#L497-L501

We should also examine this process again https://github.com/gofiber/fiber/blob/26346d6908778b872a69b0ee8c4509e1e8277047/ctx_interface.go#L461-L484

ReneWerner87 commented 8 months ago

We also have to use the code of the Reset method in the AcquireCtx

sixcolors commented 8 months ago

@efectn

We should also detect the shared stuff between tests. Something may be wrong or just false positive.

@ReneWerner87

we should investigate these cases and, in my opinion, not simply fix them through isolation

because normally it is a valid setup if a server is up and running and processing multiple simultaneous requests, if something is shared there, then we should fix the problem rather than the test

I appreciate the thoughtful discussion. My intention in raising the issue was to address what appeared to be false positives [failures] in the unit tests. I understand the importance of considering the validity of shared resources and agree that fixing the underlying problem was crucial. I didn't mean to suggest avoiding further investigation into the root cause.

What occurred to me is that the failures might be signaling a need for more granular unit testing with greater isolation, but I acknowledge that this approach could be impractical. Achieving effective coverage using more isolated unit tests could lead to requiring an extensive set of additional tests, covering such interaction scenarios with concurrency and other complexities. The current tests, though indirectly, did help identify an issue.

Thanks for your understanding. Open to further insights or discussion.

nickajacks1 commented 8 months ago

I think there are two important points to take home.

Testing concurrency is important. For example, recent changes by @gaby to testing & benchmarking revealed several concurrency issues in the templates project.
Having independent tests is important to prevent both false positives and false negatives. Here is a false positive example, #2734 . A hypothetical false negative might be that you test that foo() creates a file called bar.txt, but that file was actually created in a separate test.

It seems to me that it might be good to at the very least comb over the existing tests to make sure that we are addressing both points. Given how many tests there are, that would probably be a lot of work.

gaby commented 8 months ago

@nickajacks1 I'm planning on starting to take a look at the benchmarks for the core and the middlewares.

Several middlewares are missing Benchmarks. We also need to add RunParallel() benchmarks. None of the benchmarks are asserting inside the loop which is what exposed the issue with the templates.

Will start tomorrow with 1-2 of them to see if anything raises, we can go from there.

leonklingele commented 8 months ago

I have made almost all tests work with t.Parallel() in https://github.com/gofiber/fiber/pull/2469 which got closed unfortunately. Maybe some new work can partly be based upon that PR.

ReneWerner87 commented 8 months ago

bug fixed with https://github.com/gofiber/fiber/commit/5a0167a460ef2385ecade84683bd0007a28b97c1

gaby commented 7 months ago

This is mostly fix by #2892 I'm going to double check today how many tests are not parallel.

ReneWerner87 commented 7 months ago

ok then no need for keeping this open

gofiber / fiber

🐛 [Bug]: Isolation Issue with Parallel Subtests #2871