Consensys / teku

Open-source Ethereum consensus client written in Java
https://consensys.io/teku
Apache License 2.0
677 stars 283 forks source link

Unable to peer with Lighthouse or Prysm #1666

Closed q9f closed 4 years ago

q9f commented 4 years ago

Description

As a tester, I want to add a Teku node to a local multi-client testnet so that I can verify networking, synchronization and consensus.

Acceptance Criteria

Steps to Reproduce (Bug)

  1. Create a local v0.11 testnet with Prysm, e.g., https://gist.github.com/rauljordan/d3d200197a3ce7370f060e111079ad64
  2. (Optionally) Add a compatible local Lighthouse interop node, e.g., https://lighthouse-book.sigmaprime.io/local-testnets.html
  3. Try to add a Teku node.
~/.opt/PegaSysEng/teku master 5m 49s
❯ teku \
  --data-path=/tmp/teku/beacon \
  --eth1-deposit-contract-address="0x5ca1e00004366ac85f492887aaab12d0e6418876" \
  --eth1-endpoint=http://127.0.0.1:9779 \
  --logging="INFO" \
  --log-color-enabled=true \
  --log-include-events-enabled=true \
  --network=/home/user/.lighthouse/testnet/_teku.yaml \
  --p2p-discovery-bootnodes="enr:-LK4QF_E9JrQ7Au_SIU3_aOgQS10YbfWwFI9Olvr5-PprGrRF4xrZNceZ4KJL27MGQ-4Ee0mfcdwI77Gk1lCdYZSm7wBh2F0dG5ldHOIAAAAAAAAAACEZXRoMpCbXd4fAAAAAP__________gmlkgnY0gmlwhMCoAHmJc2VjcDI1NmsxoQPEabfYE0zNmxESRiItafrTaUumKZnSFuwEQihlbXkpdoN0Y3CCMsiDdWRwgi7g,enr:-LK4QPKMAaP6TkSi_i7y6fb9wexjrssDp3CFa8vmy6-d1ZgTLwdpjxmlj-REopshEZzfVPmY8gewFnurcBRNxH664CMBh2F0dG5ldHOIAAAAAAAAAACEZXRoMpCbXd4fAAAAAP__________gmlkgnY0gmlwhMCoAHmJc2VjcDI1NmsxoQJ-racbp4xy8mJ0RjTRvQs6eQFkms6r-IYym83r8QiouIN0Y3CCN5eDdWRwgjeY,enr:-LK4QKHpgTJ-zgq7Uv_dservZv8gjMNYVTDYG7H1CU5vC2V4eukCcAiFCABHOAq2YPBgX4ldkpLOPiOg7DPCVWcGPVABh2F0dG5ldHOIAAAAAAAAAACEZXRoMpCbXd4fAAAAAP__________gmlkgnY0gmlwhH8AAAGJc2VjcDI1NmsxoQPmpjBfQHem9tSoY2by_QroDhTiXkweTmV3q3nyjmE2goN0Y3CCIyiDdWRwgiMo" \
  --p2p-discovery-enabled=true \
  --p2p-enabled=true \
  --p2p-port=8500 \
  --Xinterop-genesis-time=0 \
  --Xinterop-start-state=/home/user/.lighthouse/testnet/genesis.ssz \
  --Xinterop-enabled=true

Expected behavior: Teku should peer and sync the chain.

Actual behavior: Teku recognizes the genesis state and can correctly compute the current slot/epoch but fails to peer with Lighthouse and Prysm.

Teku:

2020-04-24 10:58:15,213 main INFO Configuring logging for destination: console and file
2020-04-24 10:58:15,214 main INFO Logging file location: /home/user/.local/share/teku/logs/teku.log
2020-04-24 10:58:15,214 main INFO Logging includes events: true
2020-04-24 10:58:15,214 main INFO Logging includes color: true
10:58:15.274 INFO  - Using data path: /tmp/teku/beacon
10:58:15.292 INFO  - Initializing storage
10:58:18.432 INFO  - Storage initialization complete
10:58:18.976 INFO  - Listening for connections on: /ip4/127.0.0.1/tcp/30303/p2p/16Uiu2HAm3h8sbjCLc7Ngb4hRmjDgLyt1cKDHCgtUaf1B9GT7YQtC
10:58:23.297 INFO  - Slot Event *** Current slot: 426, Head block: 0, Justified epoch: 0, Finalized epoch: 0, Finalized root: 0xbce2..e13f
10:58:31.307 INFO  - Slot Event *** Current slot: 427, Head block: 0, Justified epoch: 0, Finalized epoch: 0, Finalized root: 0xbce2..e13f
10:58:39.314 INFO  - Slot Event *** Current slot: 428, Head block: 0, Justified epoch: 0, Finalized epoch: 0, Finalized root: 0xbce2..e13f
10:58:51.283 INFO  - Slot Event *** Current slot: 429, Head block: 0, Justified epoch: 0, Finalized epoch: 0, Finalized root: 0xbce2..e13f
10:59:03.283 INFO  - Slot Event *** Current slot: 430, Head block: 0, Justified epoch: 0, Finalized epoch: 0, Finalized root: 0xbce2..e13f
10:59:15.283 INFO  - Slot Event *** Current slot: 431, Head block: 0, Justified epoch: 0, Finalized epoch: 0, Finalized root: 0xbce2..e13f

Prysm complains about missing ENR key "attnets" Edit, this is actually a LH bug, ref https://github.com/sigp/lighthouse/pull/1050 Edit: It's not. :sob:

[2020-04-24 10:57:47]  INFO blockchain: Finished applying state transition attestations=1 attesterSlashings=0 deposits=0 proposerSlashings=0 slot=424 voluntaryExits=0
[2020-04-24 10:57:59] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:57:59] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:57:59]  INFO blockchain: Executing state transition on block root=0x963d9021... slot=425
[2020-04-24 10:57:59]  INFO blockchain: Finished applying state transition attestations=1 attesterSlashings=0 deposits=0 proposerSlashings=0 slot=425 voluntaryExits=0
[2020-04-24 10:58:11] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:11] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:11]  INFO blockchain: Executing state transition on block root=0x3462ee03... slot=426
[2020-04-24 10:58:11]  INFO blockchain: Finished applying state transition attestations=1 attesterSlashings=0 deposits=0 proposerSlashings=0 slot=426 voluntaryExits=0
[2020-04-24 10:58:23] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:23] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:23] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:23] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:23]  INFO blockchain: Executing state transition on block root=0x88878b35... slot=427
[2020-04-24 10:58:23]  INFO blockchain: Finished applying state transition attestations=1 attesterSlashings=0 deposits=0 proposerSlashings=0 slot=427 voluntaryExits=0
[2020-04-24 10:58:35] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:35] ERROR p2p: could not retrieve subnets: missing ENR key "attnets"
[2020-04-24 10:58:35]  INFO blockchain: Executing state transition on block root=0x9b21be8c... slot=428
[2020-04-24 10:58:35]  INFO blockchain: Finished applying state transition attestations=1 attesterSlashings=0 deposits=0 proposerSlashings=0 slot=428 voluntaryExits=0

Lighthouse complains about missing ENR bitfield:

Apr 24 10:58:41.002 INFO Synced                                  slot: 428, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 3, service: slot_notifier
Apr 24 10:58:49.181 WARN Peer has invalid ENR bitfield           error: "ENR bitfield non-existent", peer_id: 16Uiu2HAm3h8sbjCLc7Ngb4hRmjDgLyt1cKDHCgtUaf1B9GT7YQtC, service: network
Apr 24 10:58:49.202 WARN RPC Error                               Error: Custom("Protocol negotiation failed."), request_id: 0, Peer: PeerId("16Uiu2HAm3h8sbjCLc7Ngb4hRmjDgLyt1cKDHCgtUaf1B9GT7YQtC"), service: router, service: network
Apr 24 10:58:53.001 INFO Synced                                  slot: 429, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 4, service: slot_notifier
Apr 24 10:58:54.235 ERRO Disconnecting and banning peer          timeout: 30s, peer_id: PeerId("16Uiu2HAm3h8sbjCLc7Ngb4hRmjDgLyt1cKDHCgtUaf1B9GT7YQtC"), service: network
Apr 24 10:59:05.001 INFO Synced                                  slot: 430, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 3, service: slot_notifier
Apr 24 10:59:17.001 INFO Synced                                  slot: 431, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 3, service: slot_notifier
Apr 24 10:59:29.001 INFO Synced                                  slot: 432, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 3, service: slot_notifier
Apr 24 10:59:41.001 INFO Synced                                  slot: 433, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 3, service: slot_notifier
Apr 24 10:59:49.263 WARN Peer has invalid ENR bitfield           error: "ENR bitfield non-existent", peer_id: 16Uiu2HAm3h8sbjCLc7Ngb4hRmjDgLyt1cKDHCgtUaf1B9GT7YQtC, service: network
Apr 24 10:59:49.271 WARN RPC Error                               Error: Custom("Protocol negotiation failed."), request_id: 0, Peer: PeerId("16Uiu2HAm3h8sbjCLc7Ngb4hRmjDgLyt1cKDHCgtUaf1B9GT7YQtC"), service: router, service: network
Apr 24 10:59:53.001 INFO Synced                                  slot: 434, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 4, service: slot_notifier
Apr 24 10:59:54.281 ERRO Disconnecting and banning peer          timeout: 30s, peer_id: PeerId("16Uiu2HAm3h8sbjCLc7Ngb4hRmjDgLyt1cKDHCgtUaf1B9GT7YQtC"), service: network
Apr 24 11:00:05.001 INFO Synced                                  slot: 435, epoch: 13, finalized_epoch: 11, finalized_root: 0x9854…5e07, peers: 3, service: slot_notifier

Frequency:

1, reproducable

Versions

~/.opt/PegaSysEng/teku master 5m 42s
❯ teku --version
teku/v0.11.0-dev-21e6be03/linux-x86_64/-na-openjdk64bitservervm-java-13

~/.opt/PegaSysEng/teku master
❯ git log -n1
commit 21e6be032d7cecf4a2192b9d0133ad7b76c9c04a (HEAD -> master, origin/master, origin/HEAD)
Author: Sally MacFarlane <sally.macfarlane@consensys.net>
Date:   Fri Apr 24 10:57:31 2020 +1000

    append error message to end of help (#1655)

    * display instructions on how to get help instead of actual help

    Signed-off-by: Sally MacFarlane <sally.macfarlane@consensys.net>

~/.opt/PegaSysEng/teku master
❯ uname -a
Linux intan 5.6.5-arch3-1 #1 SMP PREEMPT Sun, 19 Apr 2020 13:14:25 +0000 x86_64 GNU/Linux

~/.opt/PegaSysEng/teku master
❯ cat /etc/*release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="0;36"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux

~/.opt/PegaSysEng/teku master
❯ java -version
openjdk version "13.0.2" 2020-01-14
OpenJDK Runtime Environment (build 13.0.2+8)
OpenJDK 64-Bit Server VM (build 13.0.2+8, mixed mode)
AgeManning commented 4 years ago

This doesn't seem like a lighthouse bug. The PR referenced doesn't address this. We set up the attnets key in our enr, and have done so for a while: https://github.com/sigp/lighthouse/blob/master/beacon_node/eth2-libp2p/src/discovery/enr.rs#L22

These logs look to me like Teku nodes are missing the attnets ENR key and also do not support the meta_data RPC protocol

cemozerr commented 4 years ago

@AgeManning is right. We're missing a couple of things from the networking spec. We're currently heads down working on being fully compliant with the networking spec v0.11.1. We should be there soon. I'll note it here once we get there. @q9f

q9f commented 4 years ago

Ok cool.

q9f commented 4 years ago

I still cannot connect to Schlesi after #1702

It just won't peer with neither the bootnodes nor my local nodes.

Also, for some reason, teku ignores my --logging "DEBUG" flag, so I cannot get any insights.

Config: https://github.com/goerli/schlesi/tree/master/teku

q9f commented 4 years ago

I'm able to peer with both clients now :+1: