ethereumjs / ultralight

Ethereum Portal Network TypeScript implementation
https://github.com/ethereum/portal-network-specs
104 stars 25 forks source link

Random error linked to KademliaRoutingTable.getValue #672

Open foufrix opened 2 days ago

foufrix commented 2 days ago

Error :

  ddb83:Portal:StateNetwork:NODES Received 7 ENRs from f:c0a75...5e3cd +0ms
  ddb83:Portal:StateNetwork content added for: 0x20240000009a924888773203bdeea3297b7456fd80212aac9a6cca64dc898f343290cb29f300 +575ms
  ddb83:Portal:uTP:readSocket:25374:CLOSE Closing connection to dedfdee9cfe1d391f17ad84652cca1e4cdb2d0cb113189dab98564d44bfc31e1 +0ms
  ddb83:Portal:uTP:readSocket:25374:CLOSE compile=false +0ms
file:///Users/raf/Documents/ethereum/ultralight/node_modules/@chainsafe/discv5/lib/kademlia/kademlia.js:156
        return bucket.getValue(id);
                      ^

TypeError: Cannot read properties of undefined (reading 'getValue')
    at KademliaRoutingTable.getValue (/Users/raf/Documents/ethereum/ultralight/node_modules/@chainsafe/discv5/src/kademlia/kademlia.ts:186:19)
    at Discv5.findEnr (/Users/raf/Documents/ethereum/ultralight/node_modules/@chainsafe/discv5/src/service/service.ts:365:31)
    at SessionService.handleWhoAreYouRequest (/Users/raf/Documents/ethereum/ultralight/node_modules/@chainsafe/discv5/src/service/service.ts:708:22)
    at SessionService.emit (node:events:519:28)
    at SessionService.emit (node:domain:488:12)
    at SessionService.handleMessage (/Users/raf/Documents/ethereum/ultralight/node_modules/@chainsafe/discv5/src/session/service.ts:597:12)
    at UDPTransportService.processInboundPacket (/Users/raf/Documents/ethereum/ultralight/node_modules/@chainsafe/discv5/src/session/service.ts:312:21)
    at UDPTransportService.emit (node:events:519:28)
    at UDPTransportService.emit (node:domain:488:12)
    at Socket.handleIncoming (/Users/raf/Documents/ethereum/ultralight/node_modules/@chainsafe/discv5/src/transport/udp.ts:153:10)

This error pops randomly, most of the time around 10 minutes after starting a node on devnet

I think this line is where the error pop https://github.com/ethereumjs/ultralight/blob/3af6fbd83099b336cfbc24e6a57fd383ee2ea44d/packages/portalnetwork/src/client/routingTable.ts#L72

KademliaRoutingTable.getValue cannot locate a node and stop everything hanging. Adding a try-catch will avoid the crash and let it go if it does not find the appropriate node.

I suppose this is because 2 nodes spot that a node should be evicted: node1 evicts badNode, and when node2 tries to evict it, it will not find it because it's already evicted and crash?

EDIT: I'm on mac M1, node 20.18.0

acolytec3 commented 2 days ago

The issue is deeper than that. The RoutingTable object is actually undefined here which is likely a race condition somewhere since I've seen similar issues with race conditions within ultralight code before. From the stack trace, it looks like the issue is actually at the discv5 level and not within the ultralight code.

@wemeetagain have you ever seen anything like this with discv5 in other contexts (lodestar/etc)? I believe this is happening on a Mac. I've never seen this precise issue in our Ultralight nodes running on Ubuntu.