aptly-dev / aptly

aptly - Debian repository management tool
https://www.aptly.info/
MIT License
2.56k stars 371 forks source link

New command to purge old versions #291

Open dankegel opened 9 years ago

dankegel commented 9 years ago

Here's a script that claims to call aptly repo remove once for each package with older versions that need removing, for a particular architecture. It relies on gnu sort's -V option, which sorts first by package name then by package version. It's really ugly, but it illustrates that "purge old versions" is nontrivial and might be worth adding as a feature in aptly itself.

#!/bin/sh
set -x
set -e
repo=_my_repo_
arch=amd64

dup=false
for p in `aptly repo search $repo "Architecture ($arch)" | sed "s/_$arch//" | sort -V`
do
    pkg=`echo $p | sed 's,_.*,,'`
    if test "$pkg" = "$pkg_old"
    then
        dup=true
    elif $dup
    then
        dup=false
        # $p_old is latest version of some package with more than one version
        # Output a search spec for all versions older than this
        # Version is 2nd field in output of aptly repo search, separated by _
        v_old=`echo $p_old | cut -d_ -f2`
        aptly repo remove $repo "$pkg_old (<< $v_old), Architecture ($arch)"
    fi
    p_old="$p"
    pkg_old="$pkg"
done
Castaglia commented 8 years ago

For the automatic package publishing system I'm setting up (for a local Debian repository), this feature would be very useful, especially over long periods of time, as the CI/build server will churn out many versions of the packages.

jlu5 commented 8 years ago

:+1: I would love something like this.

iGuy5 commented 8 years ago

This would be great to have implemented into aptly.

rul commented 8 years ago

On top of that, this feature will be more useful if it allows the user to specify the amount of old packages to keep.

dankegel commented 8 years ago

Selecting how much history to keep is a toughie. Three possibilities come to mind: max # of versions, max age, and max total bytes for all versions of a package. That might handle a lot of use cases, especially if they could be combined.

Castaglia commented 8 years ago

For my particular use cases, either max # of versions or max age would work.

rul commented 8 years ago

I've come up with something like this:

# Removes old packages in the received repo
#
# $1: Repository
# $2: Architecture
# $3: Amount of packages to keep
repo-remove-old-packages() {
    local repo=$1
    local arch=$2
    local keep=$3

    for pkg in $(aptly repo search $repo "Architecture ($arch)" | grep -v "ERROR: no results" | sort -rV); do
        local pkg_name=$(echo $pkg | cut -d_ -f1)
        if [ "$pkg_name" != "$cur_pkg" ]; then
            local count=0
            local deleted=""
            local cur_pkg="$pkg_name"
        fi
        test -n "$deleted" && continue
        let count+=1
        if [ $count -gt $keep ]; then
            pkg_version=$(echo $pkg | cut -d_ -f2)
            aptly repo remove $repo "Name ($pkg_name), Version (<= $pkg_version)"
            deleted='yes'
        fi
    done
}

Note that the grep -v "ERROR: no results" is due #334.

smira commented 8 years ago

Issue with error messages going to stdout had been fixed already in master.

stumyp commented 8 years ago

It would be nice to have an ability to keep some fixed number of versions let's say I want last 10 versions only, so I can roll back to some of them, but do not need to keep all of them.

something like (I know it is ugly, and this is ad-hoc one-liner) :

version=`aptly repo remove -dry-run=true $repo $package | sort --version-sort  | grep $package |   tail -n $number_to leave  | head -1 | awk -F"_" '{print $2}'` 
aptly repo remove $repo  "$package ( << $version)"

UPD: just have noticed mistake in version filter

directhex commented 8 years ago

All other repository managers automatically expire old versions on upload of a new version - e.g. if I upload foo_1.0-2 then foo_1.0-1 is removed. aptly should at least optionally behave like this.

alanfranz commented 7 years ago

Hello, elaborating on @stumyp bash combo I created a Python script which performs (IMHO) the exact behaviour we'd like (I found a some issues with the bash version):

#!/usr/bin/env python2.7
import sys
from subprocess import check_output
from apt_pkg import version_compare, init_system

init_system()

repo = sys.argv[1]
package_name = sys.argv[2]
retain_how_many = int(sys.argv[3])

output = check_output(["aptly", "repo", "remove", "-dry-run=true", repo, package_name])
output = [line for line in output.split("\n") if line.startswith("[-]")]
output = [line.replace("[-] ","") for line in output]
output = [line.replace(" removed","") for line in output]

def sort_cmp(name1, name2):
    version_and_build_1 = name1.split("_")[1]
    version_and_build_2 = name2.split("_")[1]
    return version_compare(version_and_build_1, version_and_build_2)

output.sort(cmp=sort_cmp)
should_delete = output[:-retain_how_many]

if should_delete:
    print check_output(["aptly", "repo", "remove", repo] + should_delete)
else:
    print "nothing to delete"

Since it's already in Python, if @smira is interested I could try submitting a pull request for integrating such functionality in aptly itself; any idea in how you'd like to command line? I'd probably create an "aptly repo" subcommand.

figtrap commented 7 years ago

One of the issues that frequently pops up is that when one changes the version scheme or the package name, everything gets borked and all the old packages must be removed (sometimes). Can it deal with inconsistently versioned or named packages somehow?

dankegel commented 7 years ago

Debian package versioning lets package maintainers cope with changing upstream version schemes by prefixing the version number with an epoch; see http://manpages.ubuntu.com/manpages/trusty/man5/deb-version.5.html

Because the python script uses from apt_pkg import version_compare to do its version comparisons, it's likely to handle that correctly.

On Mon, Nov 7, 2016 at 8:02 AM, figtrap notifications@github.com wrote:

One of the issues that frequently pops up is that when one changes the version scheme or the package name, everything gets borked and all the old packages must be removed (sometimes). Can it deal with inconsistently versioned or named packages somehow?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smira/aptly/issues/291#issuecomment-258876651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKb4HDzMEa3CG1ZSsWFrBjSo4-aOz28ks5q70uEgaJpZM4F0QXZ .

figtrap commented 7 years ago

Thank you, I totally forgot about the epoch.

Tim Kelley

On Mon, Nov 7, 2016 at 10:19 AM, Dan Kegel notifications@github.com wrote:

Debian package versioning lets package maintainers cope with changing upstream version schemes by prefixing the version number with an epoch; see http://manpages.ubuntu.com/manpages/trusty/man5/deb-version.5.html

Because the python script uses from apt_pkg import version_compare to do its version comparisons, it's likely to handle that correctly.

On Mon, Nov 7, 2016 at 8:02 AM, figtrap notifications@github.com wrote:

One of the issues that frequently pops up is that when one changes the version scheme or the package name, everything gets borked and all the old packages must be removed (sometimes). Can it deal with inconsistently versioned or named packages somehow?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smira/aptly/issues/291#issuecomment-258876651, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAKb4HDzMEa3CG1ZSsWFrBjSo4-aOz28ks5q70uEgaJpZM4F0QXZ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smira/aptly/issues/291#issuecomment-258882014, or mute the thread https://github.com/notifications/unsubscribe-auth/AOhtauQLXabKhz3ii79gEZzqCpJSu3d8ks5q70-tgaJpZM4F0QXZ .

samuelba commented 7 years ago

I added a few things to the script of @alanfranz, now it is possible to use package queries to remove old versions.

Example call:

./purge_old_versions.py --dry-run --repo release-repo --package-query 'Name (% ros-indigo-*)' -n 1
#!/usr/bin/env python
from __future__ import print_function

import argparse
import re
import sys

from apt_pkg import version_compare, init_system
from subprocess import check_output, CalledProcessError

class PurgeOldVersions:
    def __init__(self):
        self.args = self.parse_arguments()

        if self.args.dry_run:
            print("Run in dry mode, without actually deleting the packages.")

        if not self.args.repo:
            sys.exit("You must declare a repository with: --repo")

        if not self.args.package_query:
            sys.exit("You must declare a package query with: --package-query")

        print("Remove " + self.args.package_query + " from " + self.args.repo +
              " and keep the last " + str(self.args.retain_how_many) +
              " packages")

    @staticmethod
    def parse_arguments():
        parser = argparse.ArgumentParser(
            formatter_class=argparse.RawTextHelpFormatter)
        parser.add_argument("--dry-run", dest="dry_run",
                            help="List packages to remove without removing "
                                 "them.", action="store_true")
        parser.add_argument("--repo", dest="repo",
                            help="Which repository should be searched?",
                            type=str)
        parser.add_argument("--package-query", dest="package_query",
                            help="Which packages should be removed?\n"
                                 "e.g.\n"
                                 "  - Single package: ros-indigo-rbdl.\n"
                                 "  - Query: 'Name (%% ros-indigo-*)' "
                                 "to match all ros-indigo packages. See \n"
                                 "https://www.aptly.info/doc/feature/query/",
                            type=str)
        parser.add_argument("-n", "--retain-how-many", dest="retain_how_many",
                            help="How many package versions should be kept?",
                            type=int, default=1)
        return parser.parse_args()

    def get_packages(self):
        init_system()

        packages = []

        try:
            output = check_output(["aptly", "repo", "remove", "-dry-run=true",
                                   self.args.repo, self.args.package_query])
            output = [line for line in output.split("\n") if
                      line.startswith("[-]")]
            output = [line.replace("[-] ", "") for line in output]

            for p in output:
                packages.append(
                    re.sub("[_](\d{1,}[:])?\d{1,}[.]\d{1,}[.]\d{1,}[-](.*)", '', p))
            packages = list(set(packages))
            packages.sort()

        except CalledProcessError as e:
            print(e)

        finally:
            return packages

    def purge(self):
        init_system()

        packages = self.get_packages()
        if not packages:
            sys.exit("No packages to remove.")

        # Initial call to print 0% progress
        i = 0
        l = len(packages)
        printProgressBar(i, l, prefix='Progress:', suffix='Complete', length=50)

        packages_to_remove = []
        for package in packages:
            try:
                output = check_output(["aptly", "repo", "remove",
                                       "-dry-run=true", self.args.repo,
                                       package])
                output = [line for line in output.split("\n") if
                          line.startswith("[-]")]
                output = [line.replace("[-] ", "") for line in output]
                output = [line.replace(" removed", "") for line in output]

                def sort_cmp(name1, name2):
                    version_and_build_1 = name1.split("_")[1]
                    version_and_build_2 = name2.split("_")[1]
                    return version_compare(version_and_build_1,
                                           version_and_build_2)

                output.sort(cmp=sort_cmp)
                should_delete = output[:-self.args.retain_how_many]
                packages_to_remove += should_delete

                i += 1
                printProgressBar(i, l, prefix='Progress:', suffix='Complete',
                                 length=100)

            except CalledProcessError as e:
                print(e)

        print(" ")
        if self.args.dry_run:
            print("\nThis packages would be deleted:")
            for p in packages_to_remove:
                print(p)
        else:
            if packages_to_remove:
                print(check_output(["aptly", "repo", "remove",
                                    self.args.repo] + packages_to_remove))
                print("\nRun 'aptly publish update ...' "
                      "to update the repository.")
            else:
                print("nothing to remove")

# Print iterations progress
def printProgressBar(iteration, total, prefix='', suffix='', decimals=1,
                     length=100, fill='#'):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required  : current iteration (Int)
        total       - Required  : total iterations (Int)
        prefix      - Optional  : prefix string (Str)
        suffix      - Optional  : suffix string (Str)
        decimals    - Optional  : positive number of decimals in percent
                                  complete (Int)
        length      - Optional  : character length of bar (Int)
        fill        - Optional  : bar fill character (Str)
    """
    percent = ("{0:." + str(decimals) + "f}").format(
        100 * (iteration / float(total)))
    filled_length = int(length * iteration // total)
    bar = fill * filled_length + '-' * (length - filled_length)
    print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end='\r')
    # Print New Line on Complete
    if iteration == total:
        print()

if __name__ == '__main__':
    purge_old_versions = PurgeOldVersions()
    purge_old_versions.purge()
smira commented 7 years ago

I had feature in the works which I never got to completion as it requires some large scale changes, but the idea was to enhance package queries with Python-like slice syntax, so that you could do package[3:] which would mean "all the first 3 versions of package".

gacopl commented 7 years ago

@samuelba Thanks for the script but it does not work properly with query like Name (% test), Version(% dev) it lists all packages for deletion like ignoring the Version filter, normal aptly command works without a problem with such query so i had to revert back to plain old bash hacking

wwentland commented 6 years ago

The feature mentioned by @smira would be tremendously useful for maintaining repositories that can accrue a large number of different versions. I was wondering if there has been any progress on this in the last couple of months?

smira commented 6 years ago

No progress so far on that, I have branch which implements part of the syntax, but nothing more.

fyhertz commented 5 years ago

This thread helped me a lot. Here is my take on the issue based on what I've read here. Hope it will be useful.

#!/usr/bin/env python3
import sys
import json
import codecs
import mimetypes
import uuid
import io
import re
from pathlib import Path
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
from functools import cmp_to_key
from apt_pkg import version_compare, init_system

init_system()

class MultipartFormdataEncoder(object):
    def __init__(self):
        self.boundary = uuid.uuid4().hex
        self.content_type = 'multipart/form-data; boundary={}'.format(self.boundary)

    def iter(self, files):
        encoder = codecs.getencoder('utf-8')
        for file in files:
            print('uploading file %s...' % str(file))
            yield encoder('--{}\r\n'.format(self.boundary))
            yield encoder('Content-Disposition: form-data; name="{}"; filename="{}"\r\n'.format(file.name, file.name))
            yield encoder('Content-Type: {}\r\n'.format(mimetypes.guess_type(file.name)[0] or 'application/octet-stream'))
            yield encoder('\r\n')
            with open(str(file), 'rb') as fd:
                buff = fd.read()
                yield (buff, len(buff))
            yield encoder('\r\n')
        yield encoder('--{}--\r\n'.format(self.boundary))

    def encode(self, files):
        body = io.BytesIO()
        for chunk, chunk_len in self.iter(files):
            body.write(chunk)
        return self.content_type, body.getvalue()

def sort_cmp(p1, p2):
    v1 = p1.split(' ')[2]
    v2 = p2.split(' ')[2]
    return version_compare(v1, v2)

def request(url, method='GET', data=None, files=None):
    headers = {'Content-Type': 'application/json'}

    if data is not None:        
        data = json.dumps(data).encode('utf-8')

    if files is not None:
        content_type, data = MultipartFormdataEncoder().encode(files)
        headers = {'Content-Type': content_type}

    req = Request(url, data, headers)
    req.get_method = lambda: method
    try:
        response = urlopen(req)
    except HTTPError as e:
        print('the server couldn\'t fulfill the request.')
        print('error code: ', e.code)
    except URLError as e:
        print('failed to reach a server.')
        print('reason: ', e.reason)
    else:
        rep = json.loads(response.read().decode('utf-8'))
        return rep

def purge(url, repo, name, retain_how_many):    
    data = request(url+'/api/repos/'+repo+'/packages')
    data = list(filter(lambda x: x.split(' ')[1]==name, data))
    data = sorted(data, key=cmp_to_key(sort_cmp))
    should_delete = data[:-retain_how_many]

    if should_delete:
        print('the following packages are going to be removed from %s: %s' % (repo, should_delete))
        data = {'PackageRefs': should_delete}
        rep = request(url+'/api/repos/'+repo+'/packages', method='DELETE', data=data)
    else:
        print('no version of %s deleted in %s' % (name, repo))

def main():
    url = sys.argv[1]
    repo_pattern = re.compile(sys.argv[2])
    package_glob = sys.argv[3]
    retain_how_many = int(sys.argv[4])
    directory = str(uuid.uuid4())

    # Upload packages
    packages = list(Path('.').glob(package_glob))
    print('uploading %s packages in directory %s' % (len(packages), directory))
    request(url+'/api/files/'+directory, method='POST', files=packages)

    # List repos matching repo_pattern
    repos = [r['Name'] for r in request(url+'/api/repos')]
    repos = [r for r in repos if repo_pattern.match(r)]
    print("pattern matches the following repositories: %s" % repos)

    names = {file.name.split('_')[0] for file in packages}
    for repo in repos:
        # Add package to repo
        rep = request(url+'/api/repos/'+repo+'/file/'+directory+'?noRemove=1', method='POST')
        # Delete old package
        for name in names:
            purge(url, repo, name, retain_how_many)

    # Delete upload directory
    request(url+'/api/files/'+directory, method='DELETE')

if __name__ == '__main__':
    main()

Usage: ./aptly-push <http://APTLYAPI> <REPOPATTERN> <PATH> <RETAINHOWMANY>

It will upload all packages matching the PATH glob and add them to all the repos matching the REPOPATTERN. For each repo and for each package, it then limits the number of versions to RETAINHOWMANY.

Example: ./aptly-push http://127.0.0.1:9876 "myrepo-(?:prod|staging)" "./build/*.deb" 3

mzanetti commented 2 years ago

And yet another version, based on the version of @samuelba We have 8 repos with each 2 or 4 components, 3 to 4 architectures and some 100 packages. While samuelbas version worked nicely (after porting from python2 to python3) it took about 10 minutes to purge all of them. So instead of painting a progress bar, this one should be fast enough to not need one :)

#!/usr/bin/env python3
from __future__ import print_function

import argparse
import re
import sys

from apt_pkg import version_compare, init_system
from subprocess import check_output, CalledProcessError
from functools import cmp_to_key

class PurgeOldVersions:
    def __init__(self):
        self.args = self.parse_arguments()

        if self.args.dry_run:
            print("Running in dry mode, without actually deleting the packages.")

        if not self.args.repo:
            sys.exit("You must declare a repository with: --repo")

        if not self.args.package_query:
            sys.exit("You must declare a package query with: --package-query")

        print("Removing " + self.args.package_query + " from " + self.args.repo +
              " and keeping the last " + str(self.args.retain_how_many) +
              " packages")

    @staticmethod
    def parse_arguments():
        parser = argparse.ArgumentParser(
            formatter_class=argparse.RawTextHelpFormatter)
        parser.add_argument("--dry-run", dest="dry_run",
                            help="List packages to remove without removing "
                                 "them.", action="store_true")
        parser.add_argument("--repo", dest="repo",
                            help="Which repository should be searched?",
                            type=str)
        parser.add_argument("--package-query", dest="package_query",
                            help="Which packages should be removed?\n"
                                 "e.g.\n"
                                 "  - Single package: ros-indigo-rbdl.\n"
                                 "  - Query: 'Name (%% ros-indigo-*)' "
                                 "to match all ros-indigo packages. See \n"
                                 "https://www.aptly.info/doc/feature/query/",
                            type=str)
        parser.add_argument("-n", "--retain-how-many", dest="retain_how_many",
                            help="How many package versions should be kept?",
                            type=int, default=1)
        return parser.parse_args()

    def get_packages(self):
        init_system()

        packages = {}

        try:
            print("getting packages %s" % self.args.package_query)
            output = check_output(["aptly", "repo", "remove", "-dry-run",
                                   self.args.repo, self.args.package_query]).decode('utf-8')
            output = [line for line in output.splitlines() if
                      line.startswith("[-]")]
            output = [line.replace("[-] ", "") for line in output]
            output = [line.replace(" removed", "") for line in output]

            for p in output:
                packageName = p.split("_")[0]
                version = p.split("_")[1]
                arch = p.split("_")[2]
                if packageName not in packages:
                    packages[packageName] = {}
                if arch not in packages[packageName]:
                    packages[packageName][arch] = []
                packages[packageName][arch].append(version)

        except CalledProcessError as e:
            print(e)

        finally:
            return packages

    def purge(self):
        init_system()

        packages = self.get_packages()

        packagesToRemove = []

        for package in packages:
            for arch in packages[package]:
                versions = packages[package][arch]

                versions = sorted(versions, key=cmp_to_key(version_compare))
                versionsToRemove = versions[:-self.args.retain_how_many]
                for versionToRemove in versionsToRemove:
                    packagesToRemove.append("%s_%s_%s" % (package, versionToRemove, arch))

        if len(packagesToRemove) == 0:
            sys.exit("No packages to remove.")

        if self.args.dry_run:
            print(check_output(["aptly", "repo", "remove", "-dry-run", self.args.repo] + packagesToRemove).decode("utf-8"))
        else:
            print(check_output(["aptly", "repo", "remove", self.args.repo] + packagesToRemove).decode("utf-8"))

if __name__ == '__main__':
    purge_old_versions = PurgeOldVersions()
    purge_old_versions.purge()
james-lawrence commented 6 months ago

Could we get some guidance on the requirements for what would be required for a 3rd party to implement this.