angelo-v / wordpress-backup

Simple Docker container that helps you backup and restore your WordPress blog.
80 stars 60 forks source link

alternative cleanup solution #27

Closed adabru closed 4 years ago

adabru commented 4 years ago

On my server I have only limited space and thus I'd like to have an alternative cleanup solution to the 'older than x days' solution. The alternative should be tiered and keep backups based on different periods. Before creating a pull request I'd like to have some feedback on my suggested implementation. Maybe there is a better one from point of usability. What I had in mind:

A new cli argument like '-CLEANUP_SCHEDULE=1.1.1.4.4.30.30'

The resulting backups plan would look like:

x = latest backup
- = deleted
Day 01 x
Day 02 x 1+4+30
Day 03 x      1 1+4+30
Day 04 x      1      1 1+4+30
Day 05 x      1      1      1   4+30
Day 06 x    1+4      1      1      -   4+30
Day 07 x      1    1+4      1      -      -   4+30
Day 08 x      1      1    1+4      -      -      -   4+30
Day 09 x      1      1      1      4      -      -      -   4+30
Day 10 x    1+4      1      1      -      4      -      -      -   4+30
Day 11 x      1    1+4      1      -      -      4      -      -      -   4+30
Day 12 x      1      1    1+4      -      -      -      4      -      -      -   4+30
Day 13 x      1      1      1      4      -      -      -      4      -      -      -    30
Day 14 x    1+4      1      1      -      4      -      -      -      -      -      -      -    30

The code would look like following:

# read periods from cli argument
var periods = [{distance: 1, count: 1}, {distance: 4, count: 2}, {distance: 30, count: 2}]
# read backup dates from file system
var backups = [19990101, 19990102, 19990103, 19990104, ...]

# cache which backups to keep and which to remove
for each backup:
  flag_for_deletion(backup)

# find out which backups to keep
for each period:
  for i from 1 to period.count:
    # find the best suited backup, i.e. the oldest backup that is not older than i * period.distance
    var best = backups.filter(backup is not older than i * period.distance).find(is oldest)
    unflag_for_deletion(best)

# remove unneeded backups
for each backup:
  if flagged_for_deletion(backup)
    delete backup

The tiered backup schedule is borrowed from https://hub.docker.com/r/prodrigestivill/postgres-backup-local. The latter uses the cli options BACKUP_KEEP_DAYS, BACKUP_KEEP_WEEKS, BACKUP_KEEP_MONTHS. But that control creates some duplicate backups which is not very convenient for my very limited storage situation.

angelo-v commented 4 years ago

Thank you for submitting your idea. Such a cleanup could as well be implemented as a seperate script or docker container. Would this be an option for you? I am hesitating to add more complexity to wordpress-backup. At least we should add unit testing then and perhaps migrate to a better testable language (python?). But I am not going to invest time into that right now.

adabru commented 4 years ago

Such a cleanup could as well be implemented as a seperate script or docker container. Would this be an option for you?

Yes, of course.

I am hesitating to add more complexity to wordpress-backup.

This seems reasonable to me. If this issue will get many likes in the future, the issue still can be reconsidered. Thanks for your answer and your docker image.

adabru commented 4 years ago

For those interested, the script I'm using is following backup_schedule.py:

#!/usr/bin/python

import sys, os, re, datetime

if len(sys.argv) < 3:
  print(
    'usage:\n   \033[1mbackup_schedule.py\033[22m /path/to/backup/folder x.y.z\n\n' +
    'The scheme are the approximate distances between the kept backups.\n' +
    'Files in the backup folder must be in the scheme *yyyymmdd*')
  exit()

backups = {}
p = re.compile(r'[\d]{8}')
for file in os.listdir(sys.argv[1]):
  m = p.search(file)
  if m != None:
    d = datetime.datetime.strptime( m.group(), "%Y%m%d" ).date()
    if not d in backups:
      backups[d] = {
        'files': [],
        'delete': True
      }
    backups[d]['files'].append(file)

periods = [int(x) for x in sys.argv[2].split('.')]

# keep today's backup
today = datetime.date.today()
if today in backups:
  backups[today]['delete'] = False

# find out which other backups to keep
cursor = today
for p in periods:
  cursor -= datetime.timedelta(days=p)
  # find the best suited backup, i.e. the oldest backup that is not older than the specified period
  best = min({k: v for k, v in backups.items() if k >= cursor}, default=None)
  if best != None:
    backups[best]['delete'] = False

# delete obsolete backups
for b in backups:
  if backups[b]['delete']:
    for file in backups[b]['files']:
      os.remove(sys.argv[1] + '/' + file)

and I added following script backup_schedule:

#!/bin/bash
chown -R 33:33 "/home/www-backup"
sudo -u "#33" python3 "<PWD>/backup_schedule.py" "/home/www-backup" 1.2.4.8.24

to cron.daily:

sed -e "s|<PWD>|$PWD|" backup_schedule > /tmp/backup_schedule
sudo cp /tmp/backup_schedule /etc/cron.daily/backup_schedule
sudo chmod +x /etc/cron.daily/backup_schedule

It seems to work fine for me.