Closed ConfidenceYobo closed 3 years ago
check logs for issue like "too many open files" - which is the most common cause for backend to crash causing the 502 Bad Gateway error.
Also check if you may be running out of memory.
Thanks for your response. I have checked, I am not running low on memory. I have used up only 2% of my memory and also have enough space on disk.
I have checked the log, I can't find any "too many open files" error, but I found this Server._handleTransactionBundle: Rejected transaction < TxHash: 4cb5bb4e968c37c98376ceb1c14aac74be1303bc309eddfc343f92ad3a5f42b7, TxnType: BC1YLiSpY6Ec9NWTNfmziLhSrrdB8dbVx4nspWAgkZgKic3Wxteiynx, PubKey: LIKE > from peer [ Remote Address: 34.123.41.111:17000 PeerID=2 ] from mempool: TxErrorDuplicate
in the log
Those do happen often as a result of a crash - it may stop TXIndex keeping up with new blocks. But ive not seen it cause crashes.
What are some possible causes of crashes?
What i mentioned above
out of memory out of files
also
out of discspace server crash
But none of this is the case for me
I get this sometimes on the admin section of a node - and i have to logout from my bitclout account on the node and log back in for it to go away.
Are you seeing the same?
I had that on mine when it was running. On Aug 16, 2021, 10:30 AM -0700, BitClout @Tijn @.***>, wrote:
I get this sometimes on the admin section of a node - and i have to logout from my bitclout account on the node and log back in for it to go away. Are you seeing the same? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
It happens mostly when am not logged in to the bitclout node but using the api
@tijno sorry for spamming the conversation again... but I keep getting notified now because of the tagline behind your name: "(BitClout @Tijn)" 🤣
oh man github :) sorry @tijn ill change it
oh man github :) sorry @tijn ill change it
@tijno Thank you!
all done
fixed the issue by increasing the memory of the server to 64GB.
Hey -- wanted to drop a comment here as this has been happening on 8 nodes under my company's management. All of the machines have 30gb of memory, and we solve the OOMs by simply using docker's restart flag (I know, not a great option, but it works temporarily). After speaking with @tijno, he runs nodes on a 32gb machine, and max's at around 60% memory usage. I'll also note, that all eight of these nodes have been synced for an extended period of time, and these crashes occur quite randomly. The following OOM occurs:
[2025186.138224] Out of memory: Killed process 215489 (backend) total-vm:266280732kB, anon-rss:30178620kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:108952kB oom_score_adj:0
[2025187.222710] oom_reaper: reaped process 215489 (backend), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
The OOMs have all been caused by rejected Duplicate Tx's:
E0831 14:01:03.874597 1 server.go:1311] Server._handleTransactionBundle: Rejected transaction < TxHash: 25452952cf8b3a8adc6f3412a2bcc4b9aa4e7960ec4d3052b8f4f8e1ff42d93c, TxnType: BC1YLhhrJUg1ms7P3YMQcjGPTVY9Tf8poJ1Xdeqt6AsoJ5g3zNvFz98, PubKey: PRIVATE_MESSAGE > from peer [ Remote Address: 34.123.41.111:17000 PeerID=5 ] from mempool: TxErrorDuplicate
While increasing memory is definitely a solution, and restarting on the crash is also.... something haha, I see no reason why a node can't run on a 30gb machine. My worry is that there's a potential memory leak, even though such is fairly uncommon in go... Beyond this, I have little idea why an already-synced node would require more than 30g -- especially since this is occurring uniformly across all 8 nodes under our management all after a Duplicate TX error is produced.
It is, of course, also possible that I'm just missing something. Would really appreciate any suggestions, as simply restarting the process after a crash isn't likely the best approach, let alone being effective long-term hahaha
We profile our nodes 24/7 and aren't aware of any memory leaks. Badger is a memory hog and is on its way out.
Makes sense -- thanks for the reply @maebeam
Glad to see badger go for a number of reasons hahaha
Everything works as normal but sometimes it suddenly shows "502 Bad Gateway" error and everything stops working until I restart the node - sometimes I may need to resync the node for everything to work as normal.