Not sure if STM implementation is buggy wrt masking state

barrucadu / dejafu

Systematic concurrency testing meets Haskell.

https://dejafu.docs.barrucadu.co.uk/

191 stars 18 forks source link

Not sure if STM implementation is buggy wrt masking state #369

Closed robinp closed 1 year ago

robinp commented 1 year ago

Hello - not sure if it is me or dejafu/concurrency, but something seems off: I wanted to test if, according to dejafu, an STM action executed in atomically is interruptible or not (the GHC Exception docs only say one that has retry in it is interruptible, but not if one without is or is not).

I added this test case:

import Control.Concurrent.Classy as CC
import Test.DejaFu     

[...]

test_atomically_retry :: Assertion
test_atomically_retry = do
  res <- autocheck $ do
    var <- CC.newTVarConc (0 :: Int)
    -- tid <- CC.mask_ . CC.fork $ do
    tid <- CC.uninterruptibleMask_ . CC.fork $ do
    -- tid <- CC.uninterruptibleMask_ . CC.fork $ CC.uninterruptibleMask_ $ do
      CC.atomically $ CC.writeTVar var 1
      CC.atomically $ CC.writeTVar var 2
    _ <- CC.fork $ CC.killThread tid
    CC.readTVarConc var
  assertBool "autochecked" res

Now, this gives the same output with the current and the two other uncommented lines:

[pass] Successful
[fail] Deterministic
    0 S0-------

    2 S0-----P1---S0--

    1 S0-----P1-P0--

Which is odd, since at least under uninterruptibleMask_, the 1 case should never happen as I understand, as that thread shouldn't receive any async exceptions.

(Also not very sure why I'm using newTVarConc / readTVarConc but writeTVar, but since the types check out, probably ok?)

Do you have any insight what am I doing / expecting wrong here? Thank you. (GHC 8.10.7 / concurrency 1.11.0.2 / dejafu 2.4.0.3)

robinp commented 1 year ago

Also, was eyeballing the tests, https://github.com/barrucadu/dejafu/blob/master/dejafu-tests/lib/Integration/MultiThreaded.hs#L211 seems off.

tid <- fork $ mask $ \_ -> putMVar x () >> putMVar y () should easily be killed, just at the point when the forked thread is started, but the mask is not established yet. That's why the normal pattern for setting up exception handler in thread is something like

mask $ \restore -> fork[IO] (restore someAction `finally` whatever)

So maybe there are bigger problems around interruptible points.

robinp commented 1 year ago

(Unrelated sidenote: the empirical answer based on running

-- flip commented parts to test that nontrivially `retry`-containing STM is interruptible
test_ghc_atomically_retry :: Assertion
test_ghc_atomically_retry = do
    var <- CNC.newTVarIO ({- 500 -} 0 :: Int)
    {-
    _ <- CNC.forkIO $ do
        CNC.threadDelay 25
        CNC.atomically $ CNC.writeTVar var 0
        -}
    tid <- CNC.mask $ \_restore ->
              CNC.uninterruptibleMask $ \_restoreMaskInterruptible ->
                CNC.forkIO $ do  -- some delay under untinterruptibleMask, so a kill exception has chance to queue up
        CNC.threadDelay 20
        _restoreMaskInterruptible (  -- if there's an interruptible point within, we would get an exception, otherwise not
          CNC.atomically $ do
            _x <- CNC.readTVar var
            -- unless (x == 0) CNC.retry
            CNC.writeTVar var (fromIntegral (fib 20))  -- could actually be removed
            CNC.writeTVar var (fromIntegral (fib 21))
          )
    v <- CNC.newEmptyMVar
    _ <- CNC.forkIO $ do
        CNC.threadDelay 10
        -- This will surely block until the exception was delivered to
        -- tid, which means it was at an interruptible point (or exited)
        CNC.uninterruptibleMask_ $ CNC.killThread tid
        CNC.putMVar v ()
    CNC.takeMVar v
    res <- CNC.readTVarIO var
    res @?= 0  -- if this is ever true, STM was interruptible

For quite some time, along with stress -c <various numbers>, this kept failing. So it seems atomically without retry is indeed non-interruptible, though this case is pretty simplistic, and some more contentious STM could behave differently? Well, that's what we have for today. )

barrucadu commented 1 year ago

tid <- fork $ mask $ \_ -> putMVar x () >> putMVar y () should easily be killed, just at the point when the forked thread is started, but the mask is not established yet.

Yep, that can be killed before the mask happens - which is why the main thread does readMVar x before calling killThread - the test here is checking that, once the mask has been entered, the thread cannot be killed if it doesn't block (which is why the putMVar y () always happens):

      x <- newEmptyMVar
      y <- newEmptyMVar
      tid <- fork $ mask $ \_ -> putMVar x () >> putMVar y ()
      readMVar x
      killThread tid
      readMVar y

barrucadu commented 1 year ago

The exceptions aren't actually relevant to your example, if you remove the killThread you get the same behaviour:

autocheck $ do
  var <- CC.newTVarConc (0 :: Int)
  tid <- CC.uninterruptibleMask_ . CC.fork $ do
    CC.atomically $ CC.writeTVar var 1
    CC.atomically $ CC.writeTVar var 2
  CC.readTVarConc var

{-
[pass] Successful
[fail] Deterministic
    0 S0------

    2 S0----P1---S0--

    1 S0----P1-P0--
-}

The uninterruptibleMask_ just affects asynchronous exceptions, it doesn't stop the thread from racing with another thread, and that's exactly what we have here: thread 1 is performing two writes to the TVar in totally separate STM transactions, and thread 0 is reading the TVar.

In the 0 case, thread 1 doesn't start at all: thread 0 (the main thread) runs to completion.
In the 1 case, thread 1 interrupts thread 0 (the P1 in the trace), but then gets interrupted by thread 0 (the P0 in the trace) between writes.
In the 2 case, thread 1 interrupts thread 0, then runs to completion before thread 0 takes over again (the S0 in the trace).

This is one single atomic transaction:

atomically $ do
  writeTVar var 1
  writeTVar var 2

This is two separate transactions:

atomically $ writeTVar 1
atomically $ writeTVar 2

Does that help?

robinp commented 1 year ago

Ah, you are right, thank you. Indeed, this test exercises what I originally wanted:

test_atomically_without_retry :: Assertion
test_atomically_without_retry = do
    res <- autocheck $ do
        var <- CC.newTVarConc (0 :: Int)
        tid <- CC.mask_ . CC.fork $ CC.atomically (CC.writeTVar var 1)
        -- Throw can only continue at interruptible point or past thread exit
        CC.throwTo tid (ErrorCall "foo")
        -- If atomically is interruptible, result can be either 0 or 1.
        -- If non-interruptible, only 1 (deterministic).
        CC.readTVarConc var
    assertBool "autochecked" res

Thanks for your help!

(edit: actually using gives [1] is better, since it excludes a deterministic 0 answer)