celestiaorg / celestia-core

A fork of CometBFT
Apache License 2.0
491 stars 270 forks source link

panic: failed to listen on 127.0.0.1:47768: listen tcp 127.0.0.1:47768: bind: address already in use #822

Open rootulp opened 2 years ago

rootulp commented 2 years ago

I'm observing a test flake in CI run: https://github.com/celestiaorg/celestia-core/runs/7750400729?check_suite_focus=true

panic: failed to listen on 127.0.0.1:47768: listen tcp 127.0.0.1:47768: bind: address already in use

goroutine 1 [running]:
github.com/tendermint/tendermint/rpc/jsonrpc.setup()
    /home/runner/work/celestia-core/celestia-core/rpc/jsonrpc/jsonrpc_test.go:130 +0xd9c
github.com/tendermint/tendermint/rpc/jsonrpc.TestMain(0x0)
    /home/runner/work/celestia-core/celestia-core/rpc/jsonrpc/jsonrpc_test.go:[90](https://github.com/celestiaorg/celestia-core/runs/7750400729?check_suite_focus=true#step:6:91) +0x2a
main.main()
    _testmain.go:103 +0x365
FAIL    github.com/tendermint/tendermint/rpc/jsonrpc    0.018s
evan-forbes commented 2 years ago

this is a super common bug that can even occur locally, but is mainly due to the resource restrictions of the CI as multiple tests are ran.

I'm not really sure we will fix this here, perhaps we should move this upstream?

rootulp commented 2 years ago

Unfortunately the logs for the occurrence in the issue description have already expired.

  1. I don't see a similar issue already in tendermin/tendermint
  2. I don't see any changes in celestia-core's jsonrpc_test.go that would make this error occur more often than in tendermint's jsonrpc_test.go but I wonder if this is related to how tests are split intro groups and run in parallel. We may be able to force all tests that invoke this line to run serially in the same test group.
evan-forbes commented 2 years ago

I don't see a similar issue already in tendermin/tendermint

this error doesn't just occur in the rpc tests tho, it, or very similar errors, occur everywhere all the time here and upstream. Here are a few examples. That's why I suspect that it is related to how we run CI.

https://github.com/tendermint/tendermint/actions/runs/3234541218/jobs/5297773051#step:6:124

https://github.com/tendermint/tendermint/actions/runs/3438160242/jobs/5733862660#step:5:121

https://github.com/tendermint/tendermint/actions/runs/3311124135/jobs/5466230834#step:6:149

mindstyle85 commented 2 years ago

might be unrelated but this is a very common error for node runners, which usually means an instance is already running on that port on the localhost