celo-org / celo-blockchain

Official repository for the golang Celo Blockchain
https://celo.org
GNU Lesser General Public License v3.0
560 stars 198 forks source link

Validator disconnects from proxy (possible timeout?) #2183

Closed kamikazechaser closed 1 year ago

kamikazechaser commented 1 year ago

Expected Behavior

When running a Baklava validator and proxy on different machines connected through a vLAN, the connection should be stable. The validator should sign blocks normally.

Actual Behavior

The validator node connection to the proxy times out. The proxy node reports an error in sending messages to the validator node. The validator node can no longer sign blocks.

Steps to reproduce the behavior

This happened after running the node for 1 day after being elected as a validator. Restarting the docker container or restarting the server does not seem to fix the issue.

Both validator and proxy are run inside docker containers on separate servers. Validator internal IP is 192.168.0.10. Proxy internal IP is 192.168.0.11.

192.168.0.11: inverse host lookup failed: Unknown host
(UNKNOWN) [192.168.0.11] 30503 (?) open

Backtrace

Relevant parts:

Validator:

main  | INFO [09-07|05:33:28.167] Looking for peers                        peercount=0 tried=1 static=1
main  | DEBUG[09-07|05:33:28.171] Adding p2p peer                          peercount=1 id=911d1fa6f99f499f conn=staticdial addr=192.168.0.11:30503 name=celo/v1.8.0-beta.2/l...
main  | DEBUG[09-07|05:33:28.172] Peer completed Istanbul handshake        id=911d1fa6f99f499f conn=staticdial forcePeer=false
main  | DEBUG[09-07|05:33:28.172] Ethereum peer connected                  id=911d1fa6f99f499f conn=staticdial name=celo/v1.8.0-beta.2/l...
main  | INFO [09-07|05:33:28.172] Got new proxy peer                       func=RegisterProxyPeer           proxyPeer="Peer 911d1fa6f99f499f 192.168.0.11:30503"
main  | DEBUG[09-07|05:33:28.173] Connected proxy                          func=threadRun                   proxy="{internalNode: enode://3d8eee9a9cdd058a300748b62e0e89e260d2354a5d434d5360d96554b051258220ad21fc0e266218cd50ac644fea38e882bbabe7eeb2fa62719226fc3b48112b@192.168.0.11:30503, externalNode enode://3d8eee9a9cdd058a300748b62e0e89e260d2354a5d434d5360d96554b051258220ad21fc0e266218cd50ac644fea38e882bbabe7eeb2fa62719226fc3b48112b@89.58.30.90:30303, dcTimestamp: 2023-09-07 05:32:53.167492371 +0000 UTC m=+1.291972115, ID: 911d1fa6f99f499faa0fb3b8074bed70d53cfdb7354ec5583d47afed55b0a1bc}" chan=addProxyPeer
main  | INFO [09-07|05:33:28.173] Block synchronisation started 
main  | DEBUG[09-07|05:33:28.173] Synchronising with the network           peer=911d1fa6f99f499faa0fb3b8074bed70d53cfdb7354ec5583d47afed55b0a1bc eth=67 head=650ee8..c66bb4 td=19,254,485 mode=full
main  | DEBUG[09-07|05:33:28.173] Retrieving remote chain head             peer=911d1fa6
main  | DEBUG[09-07|05:33:28.173] Recalculated msgrate QoS values          proto=eth rtt=20s confidence=1.000 ttl=1m0s next=2023-09-07T05:33:48+0000
main  | DEBUG[09-07|05:33:28.173] Fetching batch of headers                id=911d1fa6f99f499f conn=staticdial count=1 fromhash=650ee8..c66bb4 skip=4562 reverse=true
main  | DEBUG[09-07|05:33:28.173] Current head is 19252573 
main  | INFO [09-07|05:33:28.173] Stopping istanbul.Engine validating 
main  | INFO [09-07|05:33:28.173] Mining aborted due to sync 
main  | INFO [09-07|05:33:28.173] remote validator to proxy assignment has changed func=reassignValidators          new assignment="map[911d1fa6f99f499faa0fb3b8074bed70d53cfdb7354ec5583d47afed55b0a1bc:[0x370Cd17f5532e8c27587B95e287D22374186d796 0xd5e454462b3Fd98b85640977D7a5C783CA162228 0x5dBB850c9A708A7E7aB9E49C7F4983f1817f8986 0xa4f1bad7996f346c3E90b90b60a1Ca8B67B51E4B 0xF38D301480728D094090Acc8Cab7e9d441B17688 0x42e617645bfE92F375B7459BdAeE4fAeb97024E6 0xEF0186B8eDA17BE7D1230eeB8389fA85e157E1fb 0x7C83896a06Fc492d3D78e42F90F54Ae504d0401E 0x3a7f7Fc5b1C88eCb722Dc88637F81ccF25347798 0xb8f90aEd575898c38C11dEda226E1294f4d9Dfcd 0x23836969F0d095AfEB652C75523F740898e0c462 0xdBE03E5B7331a15153Ac137f9AB9fbcbDde0407c 0x821A6b00D9F658e1732Ce741d91984C688e25c5c 0x0Cc59Ed03B3e763c02d54D695FFE353055f1502D 0xf410A55E2f4b49996a1Cb884c3107490aEE09330 0xDcE0096C358E90c4645D68A53e897416e786D05F 0xA1D7824065771455d31A2250274341cBB7E3038e 0x3F5084d3D4692cf19b0C98A9b22De614e49e1470 0xd32e429FE2155971825E14E2fd89785301ab6Dc3 0xb36978Ae7c3d0c4242915F563D702d7D0AeC9b1f 0xa30A539FD585D32c96f286C58041B9BEdac906aA 0x6D32d0EC965dEE4adBC7358D39f993cD9f949582 0x180E011058e8C9530A8E75aeD1c80D7D4ee6Bb96 0x516e0CFFF40B1506d10F060Af803524d5E82E577 0x585d4C78F2d39ce15a978C2E7f286747635C572c 0xe38DC99548b7A524F6fad156a1917009e96e626a 0xD82F41994D33e4103DE2040cDA0Ec1BF4B068147 0x591F5e1118e740888C19dD2bA717151590F5fCe1 0x53D198Eb1a101248F28923D536f093932Bebe19c 0xedDdb60EF5E90Fb09707246DF193a55Df3564c9d 0x3AcCE236F234aA330395D50D6155f86C51697318 0xbA24A13E6E155AD5066dD8C7186829cfa478E9Af 0x452302DA93E77c9594c62022841A5cd03B5d2c9B 0x2893dE77317Cee151c2359f1C0432A4AfC92a8B6 0x53ad8030a50F99cbaB540e398441f7E0B7BD5664 0x088B77Fa214EA460e2151b12FcA3Dc8669264270 0x16238B441a5a250184C269579FFA97Ce22aCC36f]]"
main  | INFO [09-07|05:33:28.173] Remote validator to proxy assignment has changed.  Sending val enode share messages and updating announce version func=threadRun
main  | INFO [09-07|05:33:28.173] Sending val enode share msg to proxy     func=sendValEnodeShareMsgs       proxy peer="Peer 911d1fa6f99f499f 192.168.0.11:30503" valAddresses length=37
main  | INFO [09-07|05:33:28.173] Skipping sending ValEnodesShareMsg b/c not validating func=sendValEnodesShareMsg
main  | DEBUG[09-07|05:33:28.174] Setting enode certificate                module=lockedHolder   func=Set                         version=1,694,064,808
main  | INFO [09-07|05:33:28.174] Sending forward msg                      func=SendForwardMsg              ethMsgCode=23 destAddresses="[0xdBE03E5B7331a15153Ac137f9AB9fbcbDde0407c 0x7C83896a06Fc492d3D78e42F90F54Ae504d0401E 0x370Cd17f5532e8c27587B95e287D22374186d796 0x23836969F0d095AfEB652C75523F740898e0c462 0x516e0CFFF40B1506d10F060Af803524d5E82E577 0xD82F41994D33e4103DE2040cDA0Ec1BF4B068147 0x585d4C78F2d39ce15a978C2E7f286747635C572c 0xa4f1bad7996f346c3E90b90b60a1Ca8B67B51E4B 0x0Cc59Ed03B3e763c02d54D695FFE353055f1502D 0x16238B441a5a250184C269579FFA97Ce22aCC36f 0x3F5084d3D4692cf19b0C98A9b22De614e49e1470 0x821A6b00D9F658e1732Ce741d91984C688e25c5c 0x3a7f7Fc5b1C88eCb722Dc88637F81ccF25347798 0x2893dE77317Cee151c2359f1C0432A4AfC92a8B6 0x53D198Eb1a101248F28923D536f093932Bebe19c 0x3AcCE236F234aA330395D50D6155f86C51697318 0xDcE0096C358E90c4645D68A53e897416e786D05F 0x180E011058e8C9530A8E75aeD1c80D7D4ee6Bb96 0xb8f90aEd575898c38C11dEda226E1294f4d9Dfcd 0xF38D301480728D094090Acc8Cab7e9d441B17688 0x088B77Fa214EA460e2151b12FcA3Dc8669264270 0xbA24A13E6E155AD5066dD8C7186829cfa478E9Af 0xedDdb60EF5E90Fb09707246DF193a55Df3564c9d 0xf410A55E2f4b49996a1Cb884c3107490aEE09330 0x42e617645bfE92F375B7459BdAeE4fAeb97024E6 0xd32e429FE2155971825E14E2fd89785301ab6Dc3 0xa30A539FD585D32c96f286C58041B9BEdac906aA 0xd5e454462b3Fd98b85640977D7a5C783CA162228 0xe38DC99548b7A524F6fad156a1917009e96e626a 0xEF0186B8eDA17BE7D1230eeB8389fA85e157E1fb 0x6D32d0EC965dEE4adBC7358D39f993cD9f949582 0x591F5e1118e740888C19dD2bA717151590F5fCe1 0xA1D7824065771455d31A2250274341cBB7E3038e 0x452302DA93E77c9594c62022841A5cd03B5d2c9B 0x53ad8030a50F99cbaB540e398441f7E0B7BD5664 0xb36978Ae7c3d0c4242915F563D702d7D0AeC9b1f 0x5dBB850c9A708A7E7aB9E49C7F4983f1817f8986]"
main  | DEBUG[09-07|05:33:28.174] Updating announce version                module=announceWorker announceVersion=1,694,064,808
main  | INFO [09-07|05:33:32.413] Stopping announcing                      module=announceWorker
main  | ERROR[09-07|05:33:51.997] Error in block freeze operation          err="block receipts missing, can't freeze block 5002000"
main  | INFO [09-07|05:33:53.150] Sending val enode share msg to proxy     func=sendValEnodeShareMsgs       proxy peer="Peer 911d1fa6f99f499f 192.168.0.11:30503" valAddresses length=37
main  | INFO [09-07|05:33:53.151] Skipping sending ValEnodesShareMsg b/c not validating func=sendValEnodesShareMsg
main  | INFO [09-07|05:33:58.172] Disconnecting from static or trusted peer id=911d1fa6f99f499f conn=staticdial purpose=ProxyPurpose reason="network error" remoteRequested=false err="read tcp 172.18.0.2:52470->192.168.0.11:30503: i/o timeout"
main  | DEBUG[09-07|05:33:58.173] Message handling failed in `eth`         id=911d1fa6f99f499f conn=staticdial err=EOF
main  | DEBUG[09-07|05:33:58.173] Removing Ethereum peer                   peer=911d1fa6 snap=false
main  | DEBUG[09-07|05:33:58.173] Removing p2p peer                        peercount=0 id=911d1fa6f99f499f duration=30.001s     req=false err="read tcp 172.18.0.2:52470->192.168.0.11:30503: i/o timeout"
main  | DEBUG[09-07|05:33:58.173] Disconnected proxy peer                  func=threadRun                   peerID=911d1fa6f99f499f chan=removeProxyPeer

Proxy:

proxy  | DEBUG[09-07|05:33:57.387] Queued delivered header or block         peer=3d479a8ef785bc6c2f1d595893a5884d96723fbb6e55a9b6c1f919a906faf158 number=19,254,490 hash=a7212e..98744c queued=1
proxy  | DEBUG[09-07|05:33:57.395] Queued delivered header or block         peer=dfe6a0dc8fd8ebe2fa145a67a74780c1c8bbbc28c077b826aac1cb690bb87893 number=19,254,490 hash=a7212e..98744c queued=1
proxy  | DEBUG[09-07|05:33:57.396] Queued delivered header or block         peer=99a7ec2f3cc6a073f745367f53fffdd7e8c0b161590739de145fc1e0498d4e08 number=19,254,490 hash=a7212e..98744c queued=1
proxy  | DEBUG[09-07|05:33:57.398] Queued delivered header or block         peer=68aa063df000e494846d9064f67e9f29c290d1d318af64fe85ddcabcefcde233 number=19,254,490 hash=a7212e..98744c queued=1
proxy  | DEBUG[09-07|05:33:57.403] Queued delivered header or block         peer=d8899cf2846025b15c0b884ff838d4d70ec19470f627e673241c84841d9e61ff number=19,254,490 hash=a7212e..98744c queued=1
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | DEBUG[09-07|05:33:58.173] Message handling failed in `eth`         id=744c823a01b9c0fb conn=inbound            err=EOF
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"
proxy  | WARN [09-07|05:33:58.173] Error in sending message                 func=AsyncMulticastCeloMsg        msgCode=17 peer="Peer 744c823a01b9c0fb 192.168.0.10:52470" ethMsgCode=17 err="shutting down"

Full logs:

System Information

Client version: 1.8.0-beta.2 image: us.gcr.io/celo-org/geth:baklava

I also tried the 1.8.0 which was released yesterday.

geth command with flags:

Validator:

version: '3.9'
services:
  celo-main:
    init: true
    user: root
    restart: unless-stopped
    container_name: main
    image: us.gcr.io/celo-org/geth:baklava
    command:
      --baklava
      --verbosity 5
      --syncmode full
      --mine
      --etherbase XXX
      --unlock XXX
      --nodiscover
      --proxy.proxied
      --proxy.proxyenodeurlpairs=XXX
      --password /root/.celo/.password
      --celostats=XXX
      --datadir /root/.celo
    volumes:
      - ./data:/root/.celo 
    ports:
      - '30303:30303'
      - '30303:30303/udp'
    networks:
      - celo
    stop_grace_period: 300s
    stop_signal: SIGINT
networks:
  celo:
    external: true

Proxy:

version: '3.9'
services:
  celo-proxy:
    init: true
    user: root
    restart: unless-stopped
    container_name: proxy
    image: us.gcr.io/celo-org/geth:baklava
    command:
      --baklava
      --verbosity 5
      --syncmode full
      --proxy.proxy
      --proxy.proxiedvalidatoraddress XXX
      --proxy.internalendpoint :30503
      --etherbase XXX
      --unlock XXX
      --password /root/.celo/.password
      --allow-insecure-unlock
      --celostats=XXX
      --datadir /root/.celo
    volumes:
      - ./data:/root/.celo 
    ports:
      - '30303:30303'
      - '30303:30303/udp'
      - '30503:30503'
    networks:
      - celo
    stop_grace_period: 300s
    stop_signal: SIGINT
networks:
  celo:
    external: true

Chain/Network: Baklava

kamikazechaser commented 1 year ago

On a second look, the issue seems to be originating from the vLAN network adapter. Ethereum messages seem to get dropped but every other type seems to work fine.

gastonponti commented 1 year ago

@kamikazechaser if the issue is the vLAN I'm closing the issue. Let me know if it requires a further investigation and we will reopen this. Thanks!