Open zhy827827 opened 4 days ago
@zhy827827 which version are you trying to run?
@stefan-mysten Run SUI can only use latest
full.yml:
authority-store-pruning-config:
num-latest-epoch-dbs-to-retain: 3
epoch-db-pruning-period-secs: 3600
num-epochs-to-retain: 0
max-checkpoints-in-batch: 10
max-transactions-in-batch: 1000
#use-range-deletion: true
pruning-run-delay-seconds: 60
num-epochs-to-retain-for-checkpoints: 2
periodic-compaction-threshold-days: 1
smooth: true
How is the progress now? I encountered the same problem. Is there a solution?
Is the TPS performance improved? https://suiscan.xyz/mainnet/analytics/cps
For folks having memory growth issues, can you follow https://gist.github.com/mwtian/0f473325a1ad5a74982fcf91737653b4 and upload the heap profile (and metrics if there are interesting findings)? cc @AndyCYB @zhy827827
sui-oom.txt I have collected the data and I don't know if it is useful sui-monitored.txt
Thanks a lot @zhy827827. Is it possible to take the memory profile as well?
I am still Learn how to get the document of memory files, and I will not use it yet
And to confirm, is your fullnode running in asia?
yes!
Interesting. We saw another instance of memory growth from fullnodes running in Asia as well.
Yes, we have two servers, one with 128GB of RAM and one with 64GB of RAM. Servers with 64GB of RAM haven't been able to run at all recently because they've been on the oom
After updating the Sui new version, the sui node is very unstable and often experiences oom kill Previously, servers with 64GB of memory could run smoothly, but now servers with 128GB of memory are all oom-kill
Oct 12 03:13:51 rockx-mainnet-merlin-sg-01 systemd[1]: sui.service: A process of this unit has been killed by the OOM killer. Oct 12 03:13:56 rockx-mainnet-merlin-sg-01 systemd[1]: sui.service: Main process exited, code=killed, status=9/KILL Oct 12 03:13:56 rockx-mainnet-merlin-sg-01 systemd[1]: sui.service: Failed with result 'oom-kill'. Oct 12 03:13:56 rockx-mainnet-merlin-sg-01 systemd[1]: sui.service: Consumed 1h 26min 28.180s CPU time.