barrucadu / dejafu

Systematic concurrency testing meets Haskell.
https://dejafu.docs.barrucadu.co.uk/
195 stars 18 forks source link

Incompleteness issue with a combination of daemon threads and sleep sets. #40

Closed barrucadu closed 8 years ago

barrucadu commented 8 years ago

This only has one execution:

{-# LANGUAGE FlexibleContexts #-}

module Terminal where

import Control.Monad.Monitor
import Control.Monad.Monitor.Property
import Control.Concurrent.Classy
import Control.Monad

data Event = Waiting Int | Trying Int deriving (Eq, Ord, Show)

prop :: Int -> Property Event
prop i = Exists (globally waiting) `SAnd`
         Exists (finally trying)   `SAnd`
         All (globally waiting `POr` finally trying)
  where
    waiting = Event (Waiting i)
    trying  = Event (Trying i)

server :: (MonadConc m, MonadMonitor Event m) => MVar m Char -> Lock m -> Int -> m ()
server charVar lock i = do
  addProperty (show i) (prop i)
  forever $ do
    -- Wait for input
    char <- withEvent (Waiting i) (takeMVar charVar)

    -- Wait for the lock
    withEvent (Trying i) (takeLock lock)

    -- Print the output
    -- liftIO (putChar char)

    -- Release the lock
    releaseLock lock

testCase :: (MonadConc m, MonadMonitor Event m) => m ()
testCase = do
  charVar <- newEmptyMVar
  lock    <- newLock

  -- Start some servers
  mapM_ (fork . server charVar lock) [0]

  -- Send some input
  mapM_ (putMVar charVar) "a"

type Lock m = MVar m ()

newLock :: MonadConc m => m (Lock m)
newLock = newMVar ()

takeLock :: MonadConc m => Lock m -> m ()
takeLock = takeMVar

releaseLock :: MonadConc m => Lock m -> m ()
releaseLock lock = putMVar lock ()

Surely there are more: the main thread stopping should be preemptable by the server thread. That doesn't exceed any bounds, so it should show up. Need to add tests to dpor to see if the breakage is there, or somewhere in dejafu.

barrucadu commented 8 years ago

I've just realised why none of the existing tests catch this: because the issue is a thread running or not before the main one terminates, without any immediate direct interaction (if there was the dependency function would see the Lookahead and trigger scheduling), so the only way to observe this thread running or not is:

Whereas all the current tests do something and then look at the returned values.

The killsEarly thing is a bit of a hack, there should really be a more principled way to solve this problem. Something as simple as scheduling all runnable threads before the initial thread's Stop might work, as then backtracking will identify points where they should be run earlier.

barrucadu commented 8 years ago

Something as simple as scheduling all runnable threads before the initial thread's Stop might work, as then backtracking will identify points where they should be run earlier.

No. The current complexity is due to computations being aborted. The Dining Philosophers test needs killsEarly in its current form to discover the deadlock.

barrucadu commented 8 years ago

Disabling schedIgnore gives an interesting result:

[pass] Never Deadlocks (checked: 9)
[pass] No Exceptions (checked: 9)
[fail] Consistent Result (checked: 8)
        () S0-----------
          0: main

        [abort] S0--------P1------------
          0: main
False

This suggests that the implementation of sleep sets isn't quite right.

barrucadu commented 8 years ago

{(0, WillLift)} is the sleep set at that point. Two things should probably be done:

I think this would solve it.

barrucadu commented 8 years ago

Thus, independent transitions can neither disable nor en- able each other, and enabled independent transitions com- mute. This definition characterizes the properties of possible “valid” dependency relations for the transitions of a given system.

In the presence of daemon threads, the dependency function needs to take termination of thread 0 into account! So that's how to solve that issue (all in dejafu, with no changes to dpor).

Unfortunately, this doesn't quite solve the issue with length bounding. If I hadn't been sent a "bug" by a user specifically needing to length bounding, I would be sorely tempted to remove it...

barrucadu commented 8 years ago

Fixed in 9bee68a, although this now finds 15 executions for the code above. The test case doesn't slow down, but that's clearly just luck. I already know I have no tests which trigger this behaviour.

Will try to think of some performance gains before merging this.

barrucadu commented 8 years ago

See second comment in #52. I think this is now done in the dejafu-0.3.2 branch, but would like to try the pusher-ws case I have first.