filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.82k stars 1.25k forks source link

[DX Streamline]: Flaky Tests in GitHub Actions #12001

Open rjan90 opened 3 months ago

rjan90 commented 3 months ago

Description

This tracking issue is to monitor the investigation and resolution of flaky/failing tests observed only in GitHub Actions. These tests have shown at most 2 failures in 54 runs. (Ref: #11786)

List of Flaky Tests

### Tasks
- [x] `itest-eth_filter` (https://github.com/filecoin-project/lotus/pull/12203)
- [x] itest-eth_legacy_transaction_test (https://github.com/filecoin-project/lotus/pull/12200)
- [x] `itest-path_type_filters` (https://github.com/filecoin-project/lotus/pull/12099)
- [x] itest-ni-porep (https://github.com/filecoin-project/lotus/actions/runs/9887807887/job/27310358031?pr=12207)
- [x] `api_test` (https://github.com/filecoin-project/lotus/pull/12244)
- [x] `itest-deals_pricing` (https://github.com/filecoin-project/lotus/pull/12099)
- [x] `wdpost_dispute_test` (https://github.com/filecoin-project/lotus/pull/12243)
- [x] `api_test` (https://github.com/filecoin-project/lotus/pull/12238)
- [x] `eth_filter_test` (https://github.com/filecoin-project/lotus/pull/12203)
- [x] `node_unmanaged` (https://github.com/filecoin-project/lotus/pull/12220)
- [ ] `unit-cli`
- [ ] unit-rest` (https://github.com/filecoin-project/lotus/actions/runs/9950234508/job/27487773104?pr=12229#logs)
- [ ] TestGetBlockByNumber (https://github.com/filecoin-project/lotus/actions/runs/10041921475/job/27751143865)
- [ ] TestEthBlockNumberAliases (https://github.com/filecoin-project/lotus/actions/runs/10055228046/job/27791473471)
- [ ] TestTraceFilter (https://github.com/filecoin-project/lotus/actions/runs/10172996595/job/28136413489)
jennijuju commented 3 months ago

@aarshkshah1992 with your remove market PR, will itest-deals_pricing be gone as well?

rjan90 commented 3 months ago

Some additional notes on a couple of these tests:

rjan90 commented 2 months ago

With the removal of markets in Lotus/Lotus-Miner, these tests has been removed:

Therefore I´m setting these as completed. Ref: https://github.com/filecoin-project/lotus/pull/12099

aarshkshah1992 commented 1 month ago

Fixed a couple of flaky tests as part of

https://github.com/filecoin-project/lotus/pull/12203 [eth_filters_itest] https://github.com/filecoin-project/lotus/pull/12200 [eth_legacy_transaction_itest]

So marking them as done .

aarshkshah1992 commented 1 month ago

https://github.com/filecoin-project/lotus/actions/runs/9887807887/job/27310358031?pr=12207

NI-PoRep itest

rvagg commented 1 month ago

This one's new, unit-rest: https://github.com/filecoin-project/lotus/actions/runs/9950234508/job/27487773104?pr=12229#logs

Not sure I want to register this as a high priority flaky because it's the first time I've seen it and I can't even see in the output what the failure is because so many tests are mixed up.

aarshkshah1992 commented 1 month ago

Discovered a bunch of flakies at https://github.com/filecoin-project/lotus/actions/runs/10041921475/job/27751143865

AND

https://github.com/filecoin-project/lotus/actions/runs/10055228046/job/27791473471

ribasushi commented 1 month ago

Another flake: https://github.com/filecoin-project/lotus/actions/runs/10062063823/job/27813660314?pr=12283

rvagg commented 1 month ago

Added TestTraceFilter which is a new test. @snissn can you quickly have a look at https://github.com/filecoin-project/lotus/actions/runs/10172996595/job/28136413489#step:10:4128 and see if you can suggest why it might be failing? It's getting 4 traces instead of 3 at https://github.com/filecoin-project/lotus/blob/f6978f01725fc8f8ef72cdf83d15aa57b8e076db/itests/eth_transactions_test.go#L705, which is weird. I'd guess it's a race if it was 2 instead of 3 but one more? What could that be finding?

rvagg commented 4 weeks ago

I've seen more instances of the above failure now.

Plus another failure in the same itest: https://github.com/filecoin-project/lotus/actions/runs/10298551550/job/28504189068?pr=12327

            Error Trace:    /home/runner/work/lotus/lotus/itests/eth_transactions_test.go:701
            Error:          Received unexpected error:
                            cannot get trace for block 14: failed to get tipset: requested a future epoch (beyond 'latest')
            Test:           TestTraceFilter

@snissn we're going to need your help on these I think.

rvagg commented 4 weeks ago

manual-onboarding flaky TestManualSectorOnboarding/WithRealProofs: https://github.com/filecoin-project/lotus/actions/runs/10300409613/job/28509755121?pr=12327 looks like a disagreement between the blockminer and the manual miner about when PoST is supposed to be submitted, blockminer pauses mining to wait for message, manual miner doesn't seem to think it needs one; seems like an unaccounted-for edge case?

ribasushi commented 4 days ago

@aarshkshah1992 itests/eth_transactions_test.go still flakes: https://github.com/filecoin-project/lotus/actions/runs/10665372320/job/29558625838