filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.86k stars 1.27k forks source link

[DX Streamline]: Flaky Tests in GitHub Actions #12001

Open rjan90 opened 6 months ago

rjan90 commented 6 months ago

Description

This tracking issue is to monitor the investigation and resolution of flaky/failing tests observed only in GitHub Actions. These tests have shown at most 2 failures in 54 runs. (Ref: #11786)

List of Flaky Tests

### Tasks
- [x] `itest-eth_filter` (https://github.com/filecoin-project/lotus/pull/12203)
- [x] itest-eth_legacy_transaction_test (https://github.com/filecoin-project/lotus/pull/12200)
- [x] `itest-path_type_filters` (https://github.com/filecoin-project/lotus/pull/12099)
- [x] itest-ni-porep (https://github.com/filecoin-project/lotus/actions/runs/9887807887/job/27310358031?pr=12207)
- [x] `api_test` (https://github.com/filecoin-project/lotus/pull/12244)
- [x] `itest-deals_pricing` (https://github.com/filecoin-project/lotus/pull/12099)
- [x] `wdpost_dispute_test` (https://github.com/filecoin-project/lotus/pull/12243)
- [x] `api_test` (https://github.com/filecoin-project/lotus/pull/12238)
- [x] `eth_filter_test` (https://github.com/filecoin-project/lotus/pull/12203)
- [x] `node_unmanaged` (https://github.com/filecoin-project/lotus/pull/12220)
- [ ] `unit-cli`
- [ ] unit-rest` (https://github.com/filecoin-project/lotus/actions/runs/9950234508/job/27487773104?pr=12229#logs)
- [ ] TestGetBlockByNumber (https://github.com/filecoin-project/lotus/actions/runs/10041921475/job/27751143865)
- [ ] TestEthBlockNumberAliases (https://github.com/filecoin-project/lotus/actions/runs/10055228046/job/27791473471)
- [ ] TestTraceFilter (https://github.com/filecoin-project/lotus/actions/runs/10172996595/job/28136413489)
jennijuju commented 6 months ago

@aarshkshah1992 with your remove market PR, will itest-deals_pricing be gone as well?

rjan90 commented 6 months ago

Some additional notes on a couple of these tests:

rjan90 commented 5 months ago

With the removal of markets in Lotus/Lotus-Miner, these tests has been removed:

Therefore I´m setting these as completed. Ref: https://github.com/filecoin-project/lotus/pull/12099

aarshkshah1992 commented 4 months ago

Fixed a couple of flaky tests as part of

https://github.com/filecoin-project/lotus/pull/12203 [eth_filters_itest] https://github.com/filecoin-project/lotus/pull/12200 [eth_legacy_transaction_itest]

So marking them as done .

aarshkshah1992 commented 4 months ago

https://github.com/filecoin-project/lotus/actions/runs/9887807887/job/27310358031?pr=12207

NI-PoRep itest

rvagg commented 4 months ago

This one's new, unit-rest: https://github.com/filecoin-project/lotus/actions/runs/9950234508/job/27487773104?pr=12229#logs

Not sure I want to register this as a high priority flaky because it's the first time I've seen it and I can't even see in the output what the failure is because so many tests are mixed up.

aarshkshah1992 commented 4 months ago

Discovered a bunch of flakies at https://github.com/filecoin-project/lotus/actions/runs/10041921475/job/27751143865

AND

https://github.com/filecoin-project/lotus/actions/runs/10055228046/job/27791473471

ribasushi commented 4 months ago

Another flake: https://github.com/filecoin-project/lotus/actions/runs/10062063823/job/27813660314?pr=12283

rvagg commented 4 months ago

Added TestTraceFilter which is a new test. @snissn can you quickly have a look at https://github.com/filecoin-project/lotus/actions/runs/10172996595/job/28136413489#step:10:4128 and see if you can suggest why it might be failing? It's getting 4 traces instead of 3 at https://github.com/filecoin-project/lotus/blob/f6978f01725fc8f8ef72cdf83d15aa57b8e076db/itests/eth_transactions_test.go#L705, which is weird. I'd guess it's a race if it was 2 instead of 3 but one more? What could that be finding?

rvagg commented 3 months ago

I've seen more instances of the above failure now.

Plus another failure in the same itest: https://github.com/filecoin-project/lotus/actions/runs/10298551550/job/28504189068?pr=12327

            Error Trace:    /home/runner/work/lotus/lotus/itests/eth_transactions_test.go:701
            Error:          Received unexpected error:
                            cannot get trace for block 14: failed to get tipset: requested a future epoch (beyond 'latest')
            Test:           TestTraceFilter

@snissn we're going to need your help on these I think.

rvagg commented 3 months ago

manual-onboarding flaky TestManualSectorOnboarding/WithRealProofs: https://github.com/filecoin-project/lotus/actions/runs/10300409613/job/28509755121?pr=12327 looks like a disagreement between the blockminer and the manual miner about when PoST is supposed to be submitted, blockminer pauses mining to wait for message, manual miner doesn't seem to think it needs one; seems like an unaccounted-for edge case?

ribasushi commented 3 months ago

@aarshkshah1992 itests/eth_transactions_test.go still flakes: https://github.com/filecoin-project/lotus/actions/runs/10665372320/job/29558625838

rvagg commented 2 months ago

I seem to have introduced a flaky test in gateway when looking at rate limits: https://github.com/filecoin-project/lotus/actions/runs/10820940825/job/30022021725#step:9:998

    gateway_test.go:398: expected end: 2024-09-11 23:04:13.504329557 +0000 UTC m=+20.756356531, now: 2024-09-11 23:04:09.391624721 +0000 UTC m=+16.643651706, allowPad: 800ms, actual delta: -4.112704738s
    gateway_test.go:399: 
            Error Trace:    /home/runner/work/lotus/lotus/itests/gateway_test.go:399
            Error:          Max difference between 2024-09-11 23:04:13.504329557 +0000 UTC m=+20.756356531 and 2024-09-11 23:04:09.39166259 +0000 UTC m=+16.643689571 allowed is 800ms, but difference was 4.11266696s
            Test:           TestGatewayRateLimits

That's saying that it's completing a series of requests ~4s faster than it should, the max allowed padding is 800ms so it's way faster than even the outer bounds of the timing. The test sets up an environment where it should slow down requests in a fairly predictable way. It's still got lots of real-world effects feeding into it that make it variable, so something's in the way.

ribasushi commented 2 months ago

another itests/eth_transactions_test.go flake similar to https://github.com/filecoin-project/lotus/issues/12001#issuecomment-2325209394

https://github.com/filecoin-project/lotus/actions/runs/11123742713/job/30907858333?pr=12535

rvagg commented 2 months ago

TestContractInvocationMultiple is flaky, here's the latest: https://github.com/filecoin-project/lotus/actions/runs/11142239872/job/30964794231

Sadly my fault again https://github.com/filecoin-project/lotus/pull/12431, but I was testing something in a way that hasn't been properly tested so it makes me sus about the underlying behaviour.

masih commented 1 month ago

TestEthBlockHashesCorrect_MultiBlockTipset in itest-eth_block_hash seems to be flaky: