GaloisInc / HaLVM

The Haskell Lightweight Virtual Machine (HaLVM): GHC running on Xen
BSD 3-Clause "New" or "Revised" License
1.05k stars 88 forks source link

BasicIVC Rendezvous fails #27

Closed TomMD closed 10 years ago

TomMD commented 10 years ago

The rendezvous for BasicIVC fails for reasons not well understood.

A work-around is to changes Rendezvous.hs waitForKey function to delay for a full second instead of 100ms. Instrumenting the code and using the original 100ms delay showed waitForKey was only being called once - where the control went after that I do not know. Changing to a longer timeout made the rendezvous work consistently.

TomMD commented 10 years ago

OK, so that was just a poorly formed bug report. Let's be a little more exact. The waitForKey function is used to wait for values in the xen store:

waitForKey :: XenStore -> String -> IO String
waitForKey xs key = do
  eres <- catch (Right <$> xsRead xs key) leftError
  case eres of
    Left _    -> threadDelay 100000 >> waitForKey xs key
    Right res -> return res
 where
  leftError :: ErrorCode -> IO (Either ErrorCode String)
  leftError = return ∘ Left

Instrumenting the Left and Right branches to log debug messages I see one of the domains, LEFT in the case of BasicIVC, enters the left branch and threadDelay from whence control never returns to the Haskell program. The buggy behavior is not consistent but does occur for the majority of the runs.

acw commented 10 years ago

I'm having trouble replicating this one, either because it magically got fixed in the GHC-stable merge, or because it's just not happening often enough to trigger on my machine. Are you still seeing this problem?

TomMD commented 10 years ago

I continue to see this behavior even after the merge with GHC-stable. I will continue to investigate, but it has proven difficult. Adding print statements to the GHC RTS usually causes the behavior to change. Increasing the delay causes the behavior to change. In fact, it is hard to alter the code in a manner that alters the timing without the bug becoming harder to reproduce.

TomMD commented 10 years ago

We have ironed all this out, yay!