gogoair / lavatory

Tooling to define repository specific retention policies in Artifactory.
Apache License 2.0
21 stars 14 forks source link

Utility cleanup for keeping artifacts based on @version #38

Open rpocase opened 5 years ago

rpocase commented 5 years ago

A predominant use case internally is generic repositories that have multiple types of artifacts within a given folder. E.g.

repo
├── module1
│   └── version
│       ├── module1-version.pdf
│       └── module1-version.tgz
└── module2
    └── version
        ├── module2-version.pdf
        └── module2-version.tgz

Count based retention works great here except in cases where I may have artifacts that should be kept regardless of age (e.g., artifacts containing metadata that can't be expressed well in properties).

Using the versions API, we could be more intelligent about keeping the latest set of versioned artifacts from particular repositories (or particular folders under a given repository).

E.g.

My workaround for the time being is like the below. This combines time_based_retention and count_based_retention while allowing for providing extra_aql to handle filtering. This still requires me to make structural changes to how I post files that should be included (e.g. they go in a separate tree that gets filtered out by AQL).

from lavatory.utils.artifactory import Artifactory
import datetime

def time_count_based_retention(artifactory: Artifactory, retention_count=1, keep_days=15,
                               project_depth=1,
                               artifact_depth=2,
                               extra_aql=None):
    """
    Discard any folders older than keep_days while maintaining at least retention_count folders

    :param artifactory: Artifactory instance provided to policy purgelist
    :param retention_count: Number of versions to keep
    :param keep_days: artifact_depth as defined by artifactory.time_based_retention
    :param project_depth: artifact_depth as defined by artifactory.count_based_retention
    :param artifact_depth: artifact_depth as defined by artifactory.count_based_retention
    :param extra_aql: extra_aql as defined by lavatory utils
    :return:
    """
    if not extra_aql:
        extra_aql = []
    versions = _get_retention_count(artifactory, extra_aql=extra_aql,
                                    retention_count=retention_count,
                                    project_depth=project_depth, artifact_depth=artifact_depth)
    now = datetime.datetime.now()
    before = now - datetime.timedelta(days=keep_days)
    created_before = before.strftime("%Y-%m-%dT%H:%M:%SZ")
    keep_days_versions = _get_retention_count(artifactory,
                                              extra_aql=extra_aql + [{'created': {"$lt": created_before}}],
                                              retention_count=retention_count,
                                              project_depth=project_depth,
                                              artifact_depth=artifact_depth)
    return [artifact for artifact in keep_days_versions if artifact in versions]

def _get_retention_count(artifactory, extra_aql=None, retention_count=1,
                         project_depth=1, artifact_depth=2):
    return artifactory.count_based_retention(retention_count=retention_count,
                                             project_depth=project_depth,
                                             artifact_depth=artifact_depth, item_type='folder',
                                             extra_aql=extra_aql)
rpocase commented 5 years ago

Worth noting that the approach above only works for structures where artifacts are stored in version folders. The simple-default layout proposes versioning artifacts directly without the version folder layer. In this context, count_based_retention only works if you produce a single type of artifact. You can certainly change your posting structure, but that typically has far reaching ramifications.

sijis commented 4 years ago

I realized this is an older issue. Replying in case this could help someone else.

It possible to use native aql to do what you desire

    terms = [ { "stat.downloaded": { "$before": "1mo" }},
                    { "@build.correlation_ids": { "$nmatch": "*" }},
                    { "name": { "$match": "manifest.json" }},
                    { "path": { "$nmatch": "*/latest" }}
                ]
    purgeable = artifactory.filter(terms=terms, depth=None, item_type="file")
    return purgeable