IntersectMBO / ouroboros-consensus

Implementation of a Consensus Layer for the Ouroboros family of protocols
https://ouroboros-consensus.cardano.intersectmbo.org
Apache License 2.0
36 stars 23 forks source link

Strange space leak in PBFT.ChainState #741

Open mrBliss opened 4 years ago

mrBliss commented 4 years ago

It has been reported tthat https://github.com/input-output-hk/ouroboros-network/commit/f2a050ba9ada3bf3ee2421f5e610947619d28337 introduced a space leak. If we remove the following line, the space leak is gone: https://github.com/input-output-hk/ouroboros-network/blob/f2a050ba9ada3bf3ee2421f5e610947619d28337/ouroboros-consensus/src/Ouroboros/Consensus/Protocol/PBFT/ChainState.hs#L356 Note that the space leak has nothing to do with EBBs themselves! Even with an empty ebbs, we still have the space leak.

Looking at the implementation of pruneEBBsLT: https://github.com/input-output-hk/ouroboros-network/blob/f2a050ba9ada3bf3ee2421f5e610947619d28337/ouroboros-consensus/src/Ouroboros/Consensus/Protocol/PBFT/ChainState.hs#L624 No matter how I try to force anchorSlot cs (bangs, seq, pass it in as an argument, ...), it causes a thunk in the StrictSeq fields. Even if I pattern match on it, ignore it (!) and do Map.map id instead of filtering. Strangely, cs { ebbs = ebbs } doesn't have the leak.

If we add {-# OPTIONS_GHC -O0 #-} to the module, the leak goes away. The leak is there with -O1 and -O2 (-fno-full-laziness doesn't help). This makes me think the leak is caused by a bug in GHC :scream:.

You can use the following snippet to quickly detect the leak:

main :: IO ()
main = do
    varCS <- Strict.newTVarWithInvariantM unsafeNoThunks CS.empty
    let go slot@(SlotNo s) = do
          when (s `mod` 1000 == 0) $ print s
          atomically $ modifyTVar varCS $
            CS.append securityParam windowSize (signerForSlot slot)
          go (succ slot)
    go 0
  where
    securityParam = SecurityParam 2
    windowSize = CS.WindowSize 2
    signerForSlot :: SlotNo -> CS.PBftSigner PBftMockCrypto
    signerForSlot slot@(SlotNo s) = CS.PBftSigner
      { pbftSignerSlotNo     = slot
      , pbftSignerGenesisKey = Crypto.VerKeyMockDSIGN (fromIntegral s `mod` 7)
      }

The thunk detection (unsafeNoThunks) will catch the thunk. Profiling the heap confirms that thunk detection is correct.

mrBliss commented 4 years ago

More info from a past investigation: I have narrowed it down to GHC's -fstrictness flag. The leak went away with -O1 -fno-strictness.

Nosing through recent GHC tickets, I found https://gitlab.haskell.org/ghc/ghc/issues/16197, which might perhaps be related. One of the comments on that ticket says:

I can reproduce this with 8.6, but not with HEAD, so I think perhaps it is fixed. Hooray. Indeed we have made some recent changes to the handling of strictness on data constructors: [..]

So let's hope that those changes indeed fixed the bug :crossed_fingers:. We'll only now for sure when upgrading to a newer GHC version.