TheTorProject / lepidopter

lepidopter: raspberry pi image for conducting OONI network measurements
https://ooni.torproject.org/
GNU General Public License v3.0
47 stars 20 forks source link

Add a daily cronjob emergency deletion of logs/files #75

Open anadahz opened 8 years ago

anadahz commented 8 years ago

This script checks the root filesystem disk usage and delete log files and a number of files upon a critical or critical disk space usage of the root filesystem.

hellais commented 8 years ago

I don't think deleting OS files like this is a good way to handle this type of situation. This can lead to the system being in an inconsistent state (for example now manpages don't exist anymore) or it could delete certain log files on which services have open handles on, leading to errors in the application.

I think the better solution for future versions to have a better file system layout where the data files are on a separate partition than the OS files.

Edit: also how much space are we really going to save by deleting the logfiles and the manpages? Note: critical levels of usage for ooniprobe related files is already implemented in ooniprobe itself and the quota is checked every hour.

anadahz commented 8 years ago

This script will delete older log files and unneeded OS system files (some of these files are being already deleted upon the build of the image see: https://github.com/TheTorProject/lepidopter/blob/master/lepidopter-fh/cleanup.sh). Given the fact that we are going to be updating lepidopter in the long run newer packages will add more docs, manpages, translations and increase the retrieved package files in the apt repository. If we ever reach this percentage 98% and 95% I guess deleting 1-2 days of logs is the least of our concerns.

@hellais try it at your Pi you will surprised of how much disk space it will free up.

We are using this script as a safeguard in order not to end up with clattered disks that will make lepidopter unresponsive and stop providing ooniprobe reports. In any case this script will not be triggered if ooniprobe's quota works as expected.

hellais commented 8 years ago

Just ran this on a lepidopter that has been running for may months now and I get this:

+ find /usr/share/doc -depth -type f '!' -name copyright
+ xargs du -sch
32K .
32K total
+ find /usr/share/doc -empty
+ xargs du -sch
4.0K    /usr/share/doc/git/contrib/buildsystems
4.0K    /usr/share/doc/git/contrib/credential
4.0K    /usr/share/doc/git/contrib/subtree
4.0K    /usr/share/doc/libssl-doc/demos/engines
4.0K    /usr/share/doc/xz-utils/extra
4.0K    /usr/share/doc/python-pycparser/examples
4.0K    /usr/share/doc/adduser/examples/adduser.local.conf.examples
4.0K    /usr/share/doc/netcat-traditional/examples
32K total
+ du -sch /usr/share/man /usr/share/groff /usr/share/info /usr/share/lintian /usr/share/linda /var/cache/man /usr/share/locale
du: cannot access '/usr/share/man': No such file or directory
du: cannot access '/usr/share/groff': No such file or directory
du: cannot access '/usr/share/info': No such file or directory
du: cannot access '/usr/share/lintian': No such file or directory
du: cannot access '/usr/share/linda': No such file or directory
28K /var/cache/man
4.0K    /usr/share/locale
32K total
+ find /var/log/ -type f -mtime +1
+ xargs du -sch
80K /var/log/daemon.log.1
4.0K    /var/log/tor/log.4.gz
4.0K    /var/log/tor/log.5.gz
4.0K    /var/log/tor/log.3.gz
4.0K    /var/log/debug.1
4.0K    /var/log/syslog.7.gz
4.0K    /var/log/auth.log.3.gz
4.0K    /var/log/auth.log.4.gz
4.0K    /var/log/kern.log.2.gz
4.0K    /var/log/messages.2.gz
928K    /var/log/ooni/ooniprobe.log.2016_8_25
932K    /var/log/ooni/ooniprobe.log.2016_8_28
932K    /var/log/ooni/ooniprobe.log.2016_8_29
0   /var/log/ooni/cronjobs.log
992K    /var/log/ooni/ooniprobe.log.2016_8_30
932K    /var/log/ooni/ooniprobe.log.2016_8_27
932K    /var/log/ooni/ooniprobe.log.2016_8_26
4.0K    /var/log/messages.4.gz
4.0K    /var/log/syslog.4.gz
4.0K    /var/log/syslog.5.gz
4.0K    /var/log/debug.2.gz
8.0K    /var/log/daemon.log.3.gz
8.0K    /var/log/daemon.log.4.gz
8.0K    /var/log/daemon.log.2.gz
4.0K    /var/log/messages.3.gz
8.0K    /var/log/kern.log.3.gz
4.0K    /var/log/auth.log.2.gz
36K /var/log/auth.log.1
4.0K    /var/log/syslog.3.gz
0   /var/log/debug
0   /var/log/btmp.1
4.0K    /var/log/wtmp.1
8.0K    /var/log/kern.log.4.gz
4.0K    /var/log/syslog.6.gz
4.0K    /var/log/kern.log.1
4.0K    /var/log/messages.1
5.8M    total
+ echo 'simulating apt-get clean'
simulating apt-get clean
+ du -sch /var/cache/apt/archives/lock /var/cache/apt/archives/partial '/var/cache/apt/archives/partial/*' /var/cache/apt/pkgcache.bin /var/cache/apt/srcpkgcache.bin
0   /var/cache/apt/archives/lock
4.0K    /var/cache/apt/archives/partial
du: cannot access '/var/cache/apt/archives/partial/*': No such file or directory
36M /var/cache/apt/pkgcache.bin
36M /var/cache/apt/srcpkgcache.bin
71M total

All it all it seems like it cleans up less than 80 MB of data of which 71 MB the apt-get cache. It seems like this is just shifting the problem forward and it will not really give a raspberry pi that is in critical state much more mileage (80 MB is about 1 days worth of ooniprobe measurements) at the cost of risking to break other system services.

anadahz commented 8 years ago

Initially my idea as discussed with @bassosimone was to delete or archive ooniprobe reports, since ooniprobe is taking care of this I decided to not instruct the emergency deletion process to remove any ooniprobe reports. 80M (71M of apt cache files and 9M of other files) is still a significant amount of disk space that will allow ooniprobe to run and enforce the quota.

@hellais as I mentioned already the emergency cleanup will not be activated if the quota functionality in ooniprobe-agent is working as expected. This script ensures that lepidopter will not be brought in a state that is impossible to recover without user intervention. These files in https://github.com/TheTorProject/lepidopter/pull/75#issuecomment-244223945 are already being deleted during the build-up process to reduce disk space in lepidopter's final image.

If we decide to drop this and take our risks I would like to know at least how well the quota enforcement is working and if it has been at all tested?

hellais commented 7 years ago

I suggest we instead go for an approach where we enable log rotation for ooniprobe logs.

In the past there were some issues with using logrotate in ooniprobe due to the fact that the logrotation of twisted was competing with the logrotate.

In https://github.com/TheTorProject/ooni-probe/pull/664 this should be resolved.

I would suggest we test if this feature does in fact work as expected and if so include log rotation for ooniprobe as well in later versions.