aptos-labs / aptos-core

Aptos is a layer 1 blockchain built to support the widespread use of blockchain through better technology and user experience.
https://aptosfoundation.org
Other
6.08k stars 3.62k forks source link

[RFE] Validate and document recommended logging optimizations #5522

Closed clay-aptos closed 1 year ago

clay-aptos commented 1 year ago

🐛 RFE

We have received some wonderful logging recommendations from members of the community that we need to test and document:

We all know that when you run the Aptos process just in the console, not often, but it can easily fall into an error and it can take from 2 to 10 minutes until the Alert comes about it, if you add this also with a reaction time, quite a lot of time passes. So I ran it as a systemd service, which has a lot of benefits, one of which is auto-restart. But I also encountered problems with very large logs that were written to the system, and besides the fact that they accumulated wildly quickly and could easily be more than 1G in a couple of days, they also greatly slowed down the system and the performance of the disk fell to ~>123.73IOPS. Attempts to delete some of the unnecessary logs on a schedule did not bring proper success, since, firstly, the logs are needed and simply compressing them and leaving them took up a lot of system resources, both disk (namely, disk performance and not free space) and processor time. At times, the disk became so slow from -for working with logs that there were even breaks in communication with the system, although the hardware is not the slowest in my system. Ultimately, in this tutorial, I will show an elegant and simple solution that will allow you to skip unnecessary logs completely and at the same time leave only what you specifically need. Let's start, to do this, go to the service file of your aptos process /etc/systemd/system/aptos-validator.service and comment if they have these lines:

StandardOutput=null

StandardError=null

Next, restart the daemon systemctl daemon-reload Next we need to create a file vim /etc/rsyslog.d/50-default.conf With this content, the content in “” will be dropped by default, for example, in this example, everything related to INFO and with the block of IPs by the firewall :msg, contains, "INFO" ~ :msg, contains, "BLOCK" ~ Restarts service rsyslog restart We need to tell the log to forward everything to the syslog, for this we will fix the log configuration vim /etc/systemd/journald.conf Namely, we will comment out everything except for this parameter Storage=none Let's restart systemctl restart systemd-journald Let's check if everything is done correctly, the last thing that will be recorded is a restart journalctl -xe Now look at the syslog vim /var/log/syslog There should be a restart message and possibly some connection errors with other nodes. You can also optionally clean up the logs by running journalctl --vacuum-time=1s In order not to leave everything halfway and no longer think about the logs, let's set up a little logrotate, for this we will edit the file vim /etc/logrotate.d/rsyslog Let's replace it with something like this { rotate 4 weekly size 25M missingok notifempty compress sharedscripts postrotate /usr/lib/rsyslog/rsyslog-rotate endscript } Let's restart systemctl reenable --now logrotate.timer In the same directory you will find other configs for other files, I recommend changing them for a week or better for a month, depending on how they are filled with you. Also it's important thing to check how your logrotate started, check cron for logrotate if it there simply delete it and restart cron. Now the most delicious =) how the performance has changed before and after the dry numbers, if anyone is interested I can show on the charts before and after CPU ~20.27% down to 12.6% IOPS Completed ~123.73 to 54.05 !CARL! The indicators for RAM and descriptors have also significantly decreased, but it is too early to publish the data since less than 2 hours have passed. I would be happy if you would like to discuss. (edited)

And:

Update 1. Previous setup tested for Debian. If something wrong and you don't see logs at all change also vim /etc/systemd/journald.conf ForwardToSyslog=yes and restart systemctl restart systemd-journald Also there is a weak place in this configuration if a lot of WARN or ERRORS coming to your syslog it will start to slow down the system , so I recommend add /var/log/ to grafana. Added some screenshots, you easily see where all settings was applied. P.S.: Thanks to APTOS team to give the opportunity doing what i love and get rewarded for that! :heart:

@wintertoro and @Markuze , let me know when we have tried out these recommendations and are comfortable including them on Aptos.dev. Thanks!

davidiw commented 1 year ago

@clay-aptos , anyway to include the original contents and not provide "internal" slack links?

clay-aptos commented 1 year ago

@clay-aptos , anyway to include the original contents and not provide "internal" slack links?

I've now replaced them. Perhaps we can get folks to file bugs summarizing the request rather than CC in Slack?

Looks like there are more entries to remove: https://github.com/aptos-labs/aptos-core/issues?q=is%3Aissue+is%3Aopen+slack

p1xel32 commented 1 year ago

If I can help with anything just tag me!

clay-aptos commented 1 year ago

If I can help with anything just tag me!

Thank you!

p1xel32 commented 1 year ago

Hi, perhaps this will come in handy. Slightly simplified and rewrote the manual. If there is such an opportunity for someone else to check, please, I would not want to let the community members down. Thank you!

1. Check and comment those strokes inside service file `/etc/systemd/system/aptos-validator.service` ``` #StandardOutput=null #StandardError=null ``` Reload Daemon `systemctl daemon-reload ` 2. Create for Debian 11.5 or edit for Ubuntu file `/etc/rsyslog.d/50-default.conf` Add to the begining of the file, it will drop part of logs with INFO tag `:msg, contains, "INFO" ~ ` Restart service `service rsyslog restart ` 3. Uncomment and edit in `/etc/systemd/journald.conf` to `Storage=none ` Now restart `systemctl restart systemd-journald ` If everything is done correctly, the last entry is a service restart `journalctl -xe ` Clean journalctl `journalctl --vacuum-time=1s ` Check logs in` /var/log/syslog` 4. Optionaly you can setup logrotate, by default it in /etc/cron.daily, after perfom all previous task you can move daily cronjob to monthly. Also you can setup it with systemd timer.
clay-aptos commented 1 year ago

Thank you, @p1xel32 !

@Markuze and @wintertoro , please remind me your desires here. Is this something we should recommend across the board? Or only for performance profiles? Should we document performance profiles?

Please advise. Thanks!

clay-aptos commented 1 year ago

I am noting we now say it's optional to run aptos-node as a systemctl service at the bottom of: https://aptos.dev/nodes/validator-node/operator/running-validator-node/run-validator-node-using-source

Alex is looking into the other recommendations here. @wintertoro , please let me know what else you need here.

H136511 commented 1 year ago

God

clay-aptos commented 1 year ago

I am reopening to ask @JoshLind if he thinks there is more information to include from the blog post above. Thanks, Josh!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 45 days with no activity. Remove the stale label or comment - otherwise this will be closed in 15 days.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 45 days with no activity. Remove the stale label or comment - otherwise this will be closed in 15 days.