Closed arpee closed 3 months ago
FYI Node Crash.
root@prod: grep -B 20 NODE logs/saito.log
2024-03-29T04:24:41: on new block : 3fb63b93a3fa9f79ab15051412f494dc8f7041b64c744
f7b370f1532d91c1ac1
2024-03-29T04:24:41: #################
2024-03-29T04:24:41: ### New Block ### 34717
2024-03-29T04:24:41: #################
2024-03-29T04:24:41: $
2024-03-29T04:24:41: $
2024-03-29T04:24:41: $
2024-03-29T04:24:41: $ PURGING ARCHIVE BLOCK
2024-03-29T04:24:41: $
2024-03-29T04:24:41: $
2024-03-29T04:24:41: $
2024-03-29T04:24:41: #################
2024-03-29T04:24:41: ### New Block ### 34718
2024-03-29T04:24:41: #################
2024-03-29T04:24:41: onAddBlockSuccess : bfef9a6f1d2f794a5b530e1c130a28cbebc8eca0
ca69be86b09cda5a87676264 complete
2024-03-29T04:24:41: onAddBlockSuccess : 3fb63b93a3fa9f79ab15051412f494dc8f7041b6
4c744f7b370f1532d91c1ac1 complete
2024-03-29T04:24:41: Sending heartbeat spam tx: 6370
2024-03-29T04:28:41: Killed
2024-03-29T04:28:42:
2024-03-29T04:28:42: > saito@0.2.0 start
2024-03-29T04:28:42: > NODE_OPTIONS='--max-old-space-size=8192' ts-node -T --file
s start.ts --
Process killed by out of memory system.
More info from discussion with Sanka in TG.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:04 AM]
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:05 AM] So - we had another crash... and I was trying to find the logs... and saw this.
Sanka Darshana, [13/5/2024 10:05 AM] I'll reduce the logs. I think I added too much logs to find that block fetch issue
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:05 AM] So the crash happened at 2024-05-12 just before 6am.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:06 AM] So in a couple of minutes we produced 200MB of logs.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:06 AM] I actually think that's going to tell us a lot.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:07 AM] So maybe don't :)
Sanka Darshana, [13/5/2024 10:07 AM] ok. 😃 I thought you sent the screenshot to say too much logs
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:17 AM] Not this time.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:17 AM] 1 2024-05-12T05:51:13 4 2024-05-12T05:51:16 37 2024-05-12T05:51:23 761 2024-05-12T05:51:24 8345 2024-05-12T05:51:25 2 2024-05-12T05:51:32 2 2024-05-12T05:54:41 47956 2024-05-12T05:54:43 103251 2024-05-12T05:54:44 120582 2024-05-12T05:54:45 122808 2024-05-12T05:54:46 85694 2024-05-12T05:54:47 74208 2024-05-12T05:54:48 74738 2024-05-12T05:54:49 48935 2024-05-12T05:54:50 73978 2024-05-12T05:54:51 64476 2024-05-12T05:54:52 72551 2024-05-12T05:54:53 66930 2024-05-12T05:54:54 75967 2024-05-12T05:54:55 86627 2024-05-12T05:54:56 90548 2024-05-12T05:54:57 111161 2024-05-12T05:54:58 107058 2024-05-12T05:54:59 60923 2024-05-12T05:55:00 63820 2024-05-12T05:55:01 49558 2024-05-12T05:55:02 65815 2024-05-12T05:55:03 77003 2024-05-12T05:55:04 72822 2024-05-12T05:55:05 75520 2024-05-12T05:55:06 88433 2024-05-12T05:55:07 93899 2024-05-12T05:55:08 90639 2024-05-12T05:55:09 53156 2024-05-12T05:55:10 49152 2024-05-12T05:55:11 36864 2024-05-12T05:55:12 40073 2024-05-12T05:55:13 15223 2024-05-12T05:55:14
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:17 AM] So this is the number of rows of log output per second... on the node...
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:18 AM] above that first row it's between 1 and 100 most of the time.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:18 AM] Looking around that time...
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:18 AM] I see a reorg start.
Sanka Darshana, [13/5/2024 10:19 AM] yes. current guess is reorg is causing this
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:20 AM] 2 2024-05-12T05:42:33 2 2024-05-12T05:42:46 2 2024-05-12T05:43:00 2 2024-05-12T05:43:15 2 2024-05-12T05:43:59 2 2024-05-12T05:44:30 2 2024-05-12T05:44:31 2 2024-05-12T05:44:44 2 2024-05-12T05:46:06 342 2024-05-12T05:51:25 4492 2024-05-12T05:54:43 9189 2024-05-12T05:54:44 10347 2024-05-12T05:54:45 11300 2024-05-12T05:54:46 7246 2024-05-12T05:54:47 6708 2024-05-12T05:54:48 6725 2024-05-12T05:54:49 3954 2024-05-12T05:54:50 6635 2024-05-12T05:54:51 5640 2024-05-12T05:54:52 6502 2024-05-12T05:54:53 5955 2024-05-12T05:54:54 6845 2024-05-12T05:54:55 7437 2024-05-12T05:54:56 8345 2024-05-12T05:54:57 9616 2024-05-12T05:54:58 9238 2024-05-12T05:54:59 5739 2024-05-12T05:55:00 5739 2024-05-12T05:55:01 4025 2024-05-12T05:55:02 5755 2024-05-12T05:55:03 7010 2024-05-12T05:55:04 6507 2024-05-12T05:55:05 6790 2024-05-12T05:55:06 7538 2024-05-12T05:55:07 8127 2024-05-12T05:55:08 8435 2024-05-12T05:55:09 4400 2024-05-12T05:55:10 4615 2024-05-12T05:55:11 3035 2024-05-12T05:55:12 3277 2024-05-12T05:55:13 1624 2024-05-12T05:55:14
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 10:20 AM] So, that's the same thing - but filtering for rows that contain 'reorg'....
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 11:28 AM] right.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 11:28 AM] maybe I can try to bottle it locally...
Sanka Darshana, [13/5/2024 11:29 AM] I downloaded the prod blocks to my laptop. will test with it too
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 11:30 AM] yeah - thinking about how to create a browser that has the wrong chain without it resetting itself...
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 11:30 AM] will test that first.
Sanka Darshana, [13/5/2024 11:31 AM] this should be easy I think. just nuke the server ?
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 11:32 AM] yes - plan is to start node locally and connect a browser... get it a few blocks... then restart the node but with the blocks from prod... then reopen the browser...
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 11:45 AM] well - that didn't trigger anything.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 12:04 PM] So... i did a few tests without success.
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 12:05 PM] I tried:
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 12:06 PM] And some variations - no crazy load on my laptop.
Sanka Darshana, [13/5/2024 12:51 PM] these things take a bit of time. fixing them is quick if I know how to recreate them
Richard Parris 🟥 (Notifications off by default)., [13/5/2024 1:16 PM] en...
Sanka Darshana, [13/5/2024 1:24 PM] it probably is a 2 different chains syncing.
Sanka Darshana, [13/5/2024 1:24 PM] or there are some inefficiencies in re org code which loads a lot of blocks in to memory
Sanka Darshana, [13/5/2024 1:28 PM] ok. full block count is 9. so not the blocks in memory
We continue to have sudden increases in memory use during operations. We need to find a way to identify these so we can log and address them.
These spikes sometimes result in crashes due to out of memory problems.