Joystream / substrate-node-joystream

Joystream Full Node
https://www.joystream.org
GNU General Public License v3.0
15 stars 16 forks source link

Low Peer count #68

Closed mnaamani closed 4 years ago

mnaamani commented 5 years ago

Over last few days we noticed some nodes logging very frequent messages like:

Banning PeerId("QmS8Gi18TmDwLv7Yr5TYmH9vQc2ydRZ8nx7a79pemfguUF") because "Peer is on different chain (our genesis: 0x78be…87c7 theirs: 0xcd89…97b2)"

and

Received extra substream after having already one open in backwards-compatibility mode with PeerId

This is a result of the network discovering other peers running different networks because of the shared DHT. see https://github.com/libp2p/rust-libp2p/issues/1087

There is a set of issues that we are aware of, some have fixes others that need to be implemented.

We need a good workaround to reduce these incidents, because nodes are getting poor connectivity (peer count) sometimes totally loosing all connectivity with joystream nodes. This is resulting in network splits.

Current workaround is to increase --in-peers and --out-peers to make enough slots available.

We should be able to import this fix: https://github.com/paritytech/substrate/commit/8bc576f5c10e189bc4bf77335b40f60d46b94949

We can optionally try to disable DHT by default and enable with a flag.

bwhm commented 5 years ago

Major improvements after paritytech/substrate#2589

mnaamani commented 5 years ago

I'm experimenting with a version of the full node with imported networking fixes from upstream.

You can try it by building the node from my forks of the node and runtime:

git clone -b networking-fixes https://github.com/mnaamani/substrate-node-joystream.git 
cd substrate-node-joystream/
git clone -b networking-fixes https://github.com/mnaamani/substrate-runtime-joystream.git
./build-runtime.sh
cargo build --release
./target/release/joystream-node

I was able to run the joystream-node after purge-chain and it synced successfully.

I still see logs like: Received extra substream after having already one open in backwards-compatibility mode with PeerId and Banning PeerId("QmXd7MQAuXkQK1r3ejSbaXKgjXmT2FvbJ3yNfLZpsQ2t8S") because "Peer is on different chain

So I can't be sure if it helps significantly with this issue.

Maybe someone can try it out and see if they have improved connectivity.

bwhm commented 5 years ago

Cool. I'll create a bounty!

mnaamani commented 4 years ago

Resolved in latest release, as we are using newer version of libp2p and we set a unique protocol id.