devopshq / artifactory-cleanup

Extended cleanup tool for JFrog Artifactory
MIT License
125 stars 66 forks source link

_collect_docker_size queries for all items in the registry #107

Open tiagomeireles opened 1 year ago

tiagomeireles commented 1 year ago

Running an aql query to get all items is very slow on large repositories. I also use object storage for the binary store which likely contributes to slower queries.

Example rule combination that I'm trying to use:

    - name: Example
      rules:
        - rule: Repo
          name: "docker"
        - rule: IncludePath
          masks: "app/*"
        - rule: DeleteDockerImagesOlderThan
          days: 14

https://github.com/devopshq/artifactory-cleanup/blob/018dcdb9c637db5210fa1d8f15a7df5288a2fee1/artifactory_cleanup/rules/docker.py#L63

I tested replacing this line with args = ["items.find", {"$or": [{'path': {'$match': 'app/*'}}]}] and it is significantly faster while retaining the size info.

Happy to attempt to contribute a fix. I thought about two potential options; disabling getting the size or accepting a mask on DeleteDockerImagesOlderThan.

allburov commented 1 year ago

it is significantly faster while retaining the size info.

What timing are you talking about, could you give an example for your case?

Like if the some cleanup-script runs even for an hour each night - it should be fine, imo.

allburov commented 1 year ago

I think right now it's not possible to pass other rules attributes to DeleteDockerImagesOlderThan - this is the reason why we requested it this way.

tiagomeireles commented 1 year ago

I stopped it after 3 hours.

I have a large backlog of things to cleanup, repo wide searches are very slow. Right now i'm using the following patch to filter to the common path of the artifacts returned, this avoid any additional parameters.

            common_path = path.commonpath([artifact['path'] for artifact in artifacts])
            args = ["items.find", {"$and": [{"repo": {"$eq": repo for repo in docker_repos}}, {"$or": [{"path": { "$match": f"{common_path}/*", }}]}]}]

Deletes are also slow in my case, each delete takes a couple minutes. Right now its performed serially, have parallel deletes been considered?

allburov commented 1 year ago

I stopped it after 3 hours.

It sounds awful, agreed. With common path it's possible that the common path will be / - so the request will be the same... But we can add it as a quick fix if it helps for some cases. Could you create a PR for that?

have parallel deletes been considered?

There ware no needs, but it's possible. We could use thread pool for that as an easy fix. If you want to add it too - please create a separate PR for that, don't mix with the common path