Open shaokun11 opened 3 weeks ago
I would prefer that the data is not cleared with each restart. The current in-memory approach forces me to restart the node every couple of days. I wanted to check in on the progress of this issue. If there's anything I can do to assist or provide further information, please don't hesitate to reach out. Thank you!
Thanks @shaokun11 for reporting this and for your patience with the slow reply (I was OOO for a few weeks and we somehow missed this issue until now).
sui start
should not clear the data with each restart, unless the --force-regenesis
flag is used. Can you please confirm that indeed after you stop the network and restart it with sui start
the data is lost? What is the command you are using?
Can you try to update your Sui CLI and see if the memory still increases so much in two days? Depending on which version you need, mainnet last release is here: https://github.com/MystenLabs/sui/releases/tag/mainnet-v1.32.2 and testnet last release is here: https://github.com/MystenLabs/sui/releases/tag/testnet-v1.33.1
@ronny-mysten Thank you for your guidance. After upgrading to the testnet-v1.33.1 of Sui, I am still experiencing new issue when I use sui start
.
The OS version is aws ec2 r6i.24xlarge ubuntu22.04
Is there anything else I need to do to upgrade the new version? Or just replace it with a new version of binary
@shaokun11 just to clarify, are you still experiencing memory problems, or are you referring to the ERROR message in the logs?
If you are referring to the ERROR message in the logs: ERROR mysten_metrics::thread_stall_monitor
, then do not worry too much about that one.
If after a day of running sui start
the memory still grows fast, then please let me know. It would be good to also share what's the purpose of starting a local network on a AWS machine - to understand better what's the workflow, the end goal, and see if we can advise you to go a different route.
Thanks!
@stefan-mysten We currently want to launch a sui network locally to do some development. So a stable version is all my need.Now this node has been running for nearly 2 months, the only problem is that each time a new epoch is generated, the memory increases by 30g+, and we have to restart it every two days
After starting with sui start,
the error occurred a few moments later, it did not continue to sync.You can find the complete log file at 1.log
ProtocolVersion(52)
Boot counter: 0
thread '2024-09-20T05:33:42.389732Z ERROR node{name=k#8dcff6d1..}: telemetry_subscribers: panicked at /home/ubuntu/sui/crates/sui-core/src/checkpoints/mo
d.rs:791:21:
transaction TransactionDigest(3YPdHDGJNG2dAT9ggXNaBqsVnZdC1pzvQTWbo37eNKw6) not found panic.file="/home/ubuntu/sui/crates/sui-core/src/checkpoints/mod.rs
" panic.line=791 panic.column=21
k#8dcff6d1..' panicked at /home/ubuntu/sui/crates/sui-core/src/checkpoints/mod.rs:791:21:
transaction TransactionDigest(3YPdHDGJNG2dAT9ggXNaBqsVnZdC1pzvQTWbo37eNKw6) not found
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
Thanks @shaokun11 for all the details, this is very helpful. Regarding the memory issue, I shared it with my colleagues.
For the sui start
not syncing, I will try locally to start / stop the DB and see if I can reproduce the issue.
Worst case scenario, I would suggest to try another version to see if you can restart the network from whatever you have in the local DB.
Thank you, @stefan-mysten, for your help on this issue!
Currently, I've tested testnet-1.29.2 (the original version I started with), and it continues to sync, but the memory usage keeps increasing. Next, I will be test other versions to check if they can continue syncing and if the memory increase issue is resolved. I will share any updates here as soon as I have new findings.
Thanks for your patience here @shaokun11 I want to try locally as well but haven't got a chance yet. Hopefully I can find some time in the weekend. Thanks again!
Steps to Reproduce Issue
sui start .
Expected Result
The memory usage should stabilize after the initial startup, with no significant continuous increase over time.
Actual Result
The memory usage gradually increases over time, from the memory monitoring point of view, each epoch will increase a little, the current epoch time is set to 6 hours.
I alse flow https://github.com/MystenLabs/sui/issues/18067#issuecomment-2166567908 to update some env, but useless.
System Information