cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.17k stars 3.57k forks source link

[Bug]: flaky test `TestLaunchProcess` #21086

Closed levisyin closed 5 days ago

levisyin commented 1 month ago

Is there an existing issue for this?

What happened?

=== RUN   TestProcessTestSuite
=== RUN   TestProcessTestSuite/TestLaunchProcess
    /home/levisyin/github.com/cosmos/cosmos-sdk/tools/cosmovisor/buffer.go:261: process.go:52: <nil> INF running app args=["foo","bar","1234","/tmp/TestProcessTestSuiteTestLaunchProcess3792210110/001/data/upgrade-info.json"] module=cosmosvisor path=/tmp/TestProcessTestSuiteTestLaunchProcess3792210110/001/cosmovisor/genesis/bin/dummyd
panic: failed to parse upgrade info file: empty upgrade-info.json in "/tmp/TestProcessTestSuiteTestLaunchProcess3792210110/001/data/upgrade-info.json"

goroutine 62 [running]:
cosmossdk.io/tools/cosmovisor.(*fileWatcher).CheckUpdate(0xc0002574a0, {{0x0, 0x0}, {0x0, 0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
    /home/levisyin/github.com/cosmos/cosmos-sdk/tools/cosmovisor/scanner.go:120 +0x39d
cosmossdk.io/tools/cosmovisor.(*fileWatcher).MonitorUpdate.func1()
    /home/levisyin/github.com/cosmos/cosmos-sdk/tools/cosmovisor/scanner.go:86 +0xc8
created by cosmossdk.io/tools/cosmovisor.(*fileWatcher).MonitorUpdate in goroutine 40
    /home/levisyin/github.com/cosmos/cosmos-sdk/tools/cosmovisor/scanner.go:82 +0x131
exit status 2
FAIL    cosmossdk.io/tools/cosmovisor   1.023s

Cosmos SDK Version

main

How to reproduce?

Run go test -timeout 30s -run ^TestProcessTestSuite$ -testify.m ^(TestLaunchProcess)$ cosmossdk.io/tools/cosmovisor -v -count=1

levisyin commented 1 month ago

I think the reason may be in the function CheckUpdate.

If the upgrade-info.json file was created, but the content hasn't been written into it, would cause the issue.

At the first time, fw.lastModTime is time.Time{}, after a file is created, the modTime of os.Stat(file).ModTime() would be the creation time, so the expression stat.ModTime().After(fw.lastModTime) would always be true.

This would cause the issue: the upgrade-info.json file was created but hasn't been updated, and the file content length verification inner parseUpgradeInfoFile would fail.

func (fw *fileWatcher) CheckUpdate(currentUpgrade upgradetypes.Plan) bool {
    if fw.needsUpdate {
        return true
    }

    stat, err := os.Stat(fw.filename)
    if err != nil {
        // file doesn't exists
        return false
    }

    if !stat.ModTime().After(fw.lastModTime) {
        return false
    }

    info, err := parseUpgradeInfoFile(fw.filename, fw.disableRecase)
    if err != nil {
        panic(fmt.Errorf("failed to parse upgrade info file: %w", err))
    }
       ...
}