determine why ThreadNet tests did not catch #2332 sooner

nfrisby commented 4 years ago

See Issue input-output-hk/ouroboros-consensus#640 for context.

nfrisby commented 4 years ago

Summary of probable reasons: we don't run it much. More than 95% (conservatively) of the time the generators cause a test cause in which the forks are "shallow", and even so the tests wouldn't necessarily fail for "deep" forks because that's just a Praos phenomenon -- testing Praos is still a WIP.

My initial thoughts as of creating this Issue:

Has FakeVRF been in use the whole time, or is that relatively recent?
RealTPraos only runs 20 tests per invocation.
- d is recip <$> choose (1, 10), so only 10% of the tests have less than ~10% round-robin.
- k is elements [5, 10], whenever d's intermittent round-robin didn't curtail them, the forks were still rarely deep enough.
- Praos tests do not fail if the observed leader schedule renders consensus impossible; that's currently accepted as just "bad luck". Recall that I've been working on "testing Praos" in general for a while as permitted (eg HFC et al took priority). I just haven't gotten there yet, sadly. This is an example of something that needs to improve.
Though I'm having trouble even finding a case where this "accept as bad luck" scenario plays out for RealTPraos -- it's rare with these generators.
- While investigating Issue input-output-hk/ouroboros-consensus#640, I have found a couple more failures on master. Both fail because Test.ThreadNet.Util.Expectations assumes a k deep fork is recoverable, but such a fork might not be recoverable due to the ChainSync k+1st header (sometimes?) requiring a newer ledger state than we have from the intersection! I haven't updated that module since this became true of ChainSync (or was it always and we just hadn't noticed this edge case until recently?).
I ran ~3500 RealTPraos on master before finding those. That would require ~175 executions of the 20-tests at a time RealTPraos test. How many times has CI ran this (with the FakeVRF)?

I still don't know if the input-output-hk/cardano-ledger-specs#1579 PR was relevant.

nc6 commented 4 years ago

FakeVRF is not recent, it's been used since before this testing work started.

IntersectMBO / ouroboros-consensus

determine why ThreadNet tests did not catch #2332 sooner #641