celestiaorg / celestia-node

Celestia Data Availability Nodes
Apache License 2.0
929 stars 925 forks source link

Celestia node keeps crashing after upgrade to v0.12.4 #3130

Closed packetstracer closed 9 months ago

packetstracer commented 9 months ago

Celestia Node version

8e5a71746bc9bd9300ad3e4853797e64fdd1d018

OS

Ubuntu 22.04.3 LTS

Install tools

No response

Others

LimitNOFILE=65536 in service file

Steps to reproduce it

Upgrade to version and restart the node:

systemctl stop celestial-node make build make install systemctl start celestial-node

Expected result

Node should start and sync with the network

Actual result

Node crashes after initialization timeout.

Relevant log output

Started celestia-bridge Cosmos daemon.                                                                                                       
2024-01-22T21:00:40.751+0100    INFO    node    nodebuilder/module.go:25        Accessing keyring...                                    
2024-01-22T21:00:40.754+0100    INFO    module/state    state/keyring.go:40     constructed keyring signer      {"backend": "test", "path": "/root/.celestia-bridge/keys", "key name": "my_celes_key", "chain-id": "celestia"}
2024-01-22T21:00:40.761+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/levels.go:171     All 25 tables opened in 1ms                                                                                                                                                        
2024-01-22T21:00:40.762+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/discard.go:66     Discard stats nextEmptySlot: 242                                                                                                                                                   
2024-01-22T21:00:40.762+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:368 Set nextTxnTs to 33758838    
2024-01-22T21:00:41.230+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/levels.go:171     All 4005 tables opened in 416ms                                                                                                                                                    
2024-01-22T21:00:41.231+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/discard.go:66     Discard stats nextEmptySlot: 0                                                                                                                                                     
2024-01-22T21:00:41.232+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:368 Set nextTxnTs to 532473      
2024-01-22T21:00:41.234+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/value.go:592      Deleting empty file: /root/.celestia-bridge/inverted_index//000233.vlog                                                                                                            
2024-01-22T21:00:42.254+0100    INFO    canonical-log   swarm/swarm_dial.go:559 CANONICAL_PEER_STATUS: peer=12D3KooWKmcDepggTMk9ZyP7yooWMbFWGxpfLEd4GL8V3NSAAM8x addr=/ip6/2001:41d0:700:7555::/udp/2121/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
2024-01-22T21:00:42.355+0100    INFO    share/discovery discovery/discovery.go:307      discovered wanted peers {"amount": 5}              
2024-01-22T21:00:42.372+0100    INFO    share/discovery discovery/discovery.go:307      discovered wanted peers {"amount": 6}                                                                                                                                                              
2024-01-22T21:00:42.924+0100    INFO    share/discovery discovery/discovery.go:307      discovered wanted peers {"amount": 7}                                                                                                                                                              
2024-01-22T21:00:50.776+0100    INFO    basichost       basic/natmgr.go:112     DiscoverNAT error:no NAT found                             
2024-01-22T21:01:02.502+0100    INFO    canonical-log   swarm/swarm_dial.go:559 CANONICAL_PEER_STATUS: peer=12D3KooWJTmLAri7HLtmUFvbPvq6PY2K6k87srLUq7F8Yi7d7AMJ addr=/ip6/2001:41d0:700:58d6::/udp/2121/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
2024-01-22T21:01:06.993+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: Application error 0x0 (remote)                                                                                                                                                
2024-01-22T21:01:11.499+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: Application error 0x0 (remote)  
2024-01-22T21:01:32.054+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: Application error 0x0 (remote)  
2024-01-22T21:01:40.801+0100    WARN    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:460 Block cache might be too small. Metrics: hit: 14 miss: 4375 keys-added: 369 keys-updated: 1 keys-evicted: 356 cost-added: 9039204271 cost-evicted: 8779951567 sets-dropped: 0 sets-rejected
: 0 gets-dropped: 0 gets-kept: 3840 gets-total: 4389 hit-ratio: 0.00                                                                         
2024-01-22T21:01:40.801+0100    WARN    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:461 Cache life expectancy (in seconds):          
 -- Histogram:                                                                                                                               
Min value: 0                                                                                                                                                                                                                                                                               
Max value: 24                                                                                                                                                                                                                                                                              
Count: 356                                                                                                                                   
50p: 2.00                                                                                                                                                                                                                                                                                  
75p: 2.00                                                                                                                                                                                                                                                                                  
90p: 8.00                                                                                                                                    
[0, 2) 281 78.93% 78.93%                                              
[2, 4) 36 10.11% 89.04%
[4, 8) 10 2.81% 91.85%
[8, 16) 19 5.34% 97.19%
[16, 32) 10 2.81% 100.00%
 --
2024-01-22T21:01:44.065+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-01-22T21:01:44.102+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-01-22T21:01:46.904+0100    INFO    canonical-log   swarm/swarm_listen.go:136       CANONICAL_PEER_STATUS: peer=12D3KooWNmS9ejL5nmoj1j5FU9tJnJvaxuGonM1X1xxEgcFtwMUV addr=/ip4/149.50.96.24/udp/2122/quic-v1 sample_rate=100 connection_status="established" dir="inbound"
2024-01-22T21:01:51.652+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: stream reset
2024-01-22T21:02:16.993+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-01-22T21:02:26.976+0100    INFO    bitswap-server  server/server.go:547    Bitswap Client ReceiveError: stream reset
2024-01-22T21:02:40.801+0100    WARN    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:460 Block cache might be too small. Metrics: hit: 21 miss: 4902 keys-added: 895 keys-updated: 2 keys-evicted: 889 cost-added: 23353935392 cost-evicted: 23105331853 sets-dropped: 0 sets-reject
ed: 0 gets-dropped: 0 gets-kept: 3840 gets-total: 4923 hit-ratio: 0.00 
2024-01-22T21:02:40.801+0100    WARN    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:461 Cache life expectancy (in seconds):
 -- Histogram:
Min value: 0
Max value: 24
Count: 889
50p: 2.00
75p: 2.00
90p: 4.00
[0, 2) 727 81.78% 81.78%
[2, 4) 86 9.67% 91.45%
[4, 8) 40 4.50% 95.95%
[8, 16) 26 2.92% 98.88%
[16, 32) 10 1.12% 100.00%
 --
2024-01-22T21:02:41.238+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:546 Lifetime L0 stalled for: 0s
2024-01-22T21:02:41.277+0100    INFO    badger  v4@v4.0.0-20231125230536-2b9e13346f75/db.go:625
Level 0 [ ]: NumTables: 00. Size: 0 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 16 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [B]: NumTables: 04. Size: 12 MiB of 14 MiB. Score: 0.00->0.00 StaleData: 9.4 KiB Target FileSize: 4.0 MiB
Level 6 [ ]: NumTables: 21. Size: 138 MiB of 138 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 8.0 MiB
Level Done
Error: node: failed to start within timeout(2m0s): context deadline exceeded
celestia-bridge.service: Main process exited, code=exited, status=1/FAILURE
celestia-bridge.service: Failed with result 'exit-code'.

Notes

Not happening to everyone

Wondertan commented 9 months ago

Show flags you start your node with

packetstracer commented 9 months ago

problem detected and is on our side, closing the issue

aditya-manit commented 9 months ago

problem detected and is on our side, closing the issue

Would you mind sharing what was it, facing something similar

packetstracer commented 9 months ago

problem detected and is on our side, closing the issue

Would you mind sharing what was it, facing something similar

The problem was due to our node was state synced so did not have all the history of the blockchain, and thus the bridge couldn't sync from scratch. Our solution was to set a node from a full history snapshot and then sync the bridge against it.