devopshq / artifactory-cleanup

Extended cleanup tool for JFrog Artifactory
MIT License
113 stars 61 forks source link

formatting question - KeepLatestVersionNFilesInFolder #83

Open t-readyroc opened 1 year ago

t-readyroc commented 1 year ago

I'm wondering if I may not be able to use this tool due to our versioning scheme. Each repo has a "Packages" subfolder that I'm including, but then also has 30+ different file names in each. I've written the equivalent number of rules for each repo, e.g.:

        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-gui-([\d]+\.[\d]+\-[\d]+).*\.rpm"
        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-influxdb-([\d]+\.[\d]+\-[\d]+).*\.rpm"
        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-ingest-([\d]+\.[\d]+\-[\d]+).*\.rpm"
        - rule: KeepLatestVersionNFilesInFolder
          count: 2

etc & so forth, to cover each of the different file names. However, when I attempt to run the cleanup, the script gives me the error:

ValueError: invalid literal for int() with base 10: '6-4'

I have tested each individual regex, to make sure that they do match the version numbers for each file name. I'm pretty sure I'm just not formatting it correctly for what the script expects. I'm guessing that maybe the rule separates the digits by dots? Which won't work for us, since our last digit is separated by a dash?

Any guidance would be appreciated.

allburov commented 1 year ago

Hi!

Have you tried to escape \ with \\? The example from the readme.md shows it as:

- rule: KeepLatestVersionNFilesInFolder
  count: 1
  custom_regexp: "[^\\d][\\._]((\\d+\\.)+\\d+)"

Could you give few examples for the filenames? I need to reproduce it on my side, don't get the filename format from the regexp

allburov commented 1 year ago

Do you use RPM repository or it's just usual file one with rpm files in it? Asking because Artifactory set some additional properties to the files in RPM repository and it can help us in the future research if above solution doesn't work.

t-readyroc commented 1 year ago

Sorry, yes, I ran into the double escape issue, & have since updated the regexp:

        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-gui-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm"
        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-influxdb-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm"
        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-ingest-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm"

The repositories are "package type: RPM," & examples of the files above are:

bootstrap-gui-3.10-33.src.rpm
bootstrap-ingest-3.10-92.src.rpm
bootstrap-influxdb-3.7-3.noarch.rpm

As you can see, the revision is delineated by a -, which is what, I think is causing the issue. I found yesterday where it looks like the rule is splitting on .

allburov commented 1 year ago

@t-readyroc thank you for pointing out the exact line!

I've pushed the fix for it here https://github.com/devopshq/artifactory-cleanup/pull/84 Could you install from the branch and test that it works for your case? If it does - I'll release changes

pip uninstall artifactory-cleanup
pip install git+https://github.com/devopshq/artifactory-cleanup.git@keep-latest-n-files-no-dot
artifactory-cleanup --version
# Must show 1.0.4

Important note - above rules don't work in that way

If you use "bootstrap-ingest-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm" regexp - the rule will keep all artifacts that doesn't match the rule (just to avoid accidental removes for new artifacts)

In order to achieve what you want - you have to split it to different policies

  policies:
    - name: reponame - bootstrap gui
      rules:
        - rule: Repo
          name: "reponame"
        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-gui-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm"
    - name: reponame - bootstrap ingest
      rules:
        - rule: Repo
          name: "reponame"
        - rule: KeepLatestVersionNFilesInFolder
          count: 2
          custom_regexp: "bootstrap-ingest-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm"
t-readyroc commented 1 year ago

Hey @allburov - running into an odd thing. Getting the following error:

Traceback (most recent call last):
  File "/home/myuser/.pyenv/versions/ans2.10/bin/artifactory-cleanup", line 33, in <module>
    sys.exit(load_entry_point('artifactory-cleanup==1.0.4', 'console_scripts', 'artifactory-cleanup')())
  File "/home/myuser/.pyenv/versions/3.8.13/envs/ans2.10/lib/python3.8/site-packages/plumbum/cli/application.py", line 177, in __new__
    return cls.run()
  File "/home/myuser/.pyenv/versions/3.8.13/envs/ans2.10/lib/python3.8/site-packages/plumbum/cli/application.py", line 634, in run
    retcode = inst.main(*tailargs)
  File "/home/myuser/.pyenv/versions/3.8.13/envs/ans2.10/lib/python3.8/site-packages/artifactory_cleanup/cli.py", line 121, in main
    for summary in cleanup.cleanup(
  File "/home/myuser/.pyenv/versions/3.8.13/envs/ans2.10/lib/python3.8/site-packages/artifactory_cleanup/artifactorycleanup.py", line 53, in cleanup
    artifacts_to_remove = policy.filter(artifacts)
  File "/home/myuser/.pyenv/versions/3.8.13/envs/ans2.10/lib/python3.8/site-packages/artifactory_cleanup/rules/base.py", line 284, in filter
    artifacts = rule.filter(artifacts)
  File "/home/myuser/.pyenv/versions/3.8.13/envs/ans2.10/lib/python3.8/site-packages/artifactory_cleanup/rules/keep.py", line 176, in filter
    artifactory_with_version.sort()
TypeError: '<' not supported between instances of 'dict' and 'dict'

Running 1.0.4:

artifactory-cleanup --version
artifactory-cleanup 1.0.4

Also: thanks for the heads-up on the rules formatting. I'm now formatting them as separate policies, like so:

- name: bootstrap-repo-* - Keep 2 most recent versions - bootstrap-gui
  rules:
    - rule: RepoByMask
      mask: "bootstrap-repo-*"
    - rule: IncludePath
      masks: "Packages"
    - rule: KeepLatestVersionNFilesInFolder
      count: 2
      custom_regexp: "bootstrap-gui-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm"
- name: bootstrap-repo-* - Keep 2 most recent versions - bootstrap-ingest
  rules:
    - rule: RepoByMask
      mask: "bootstrap-repo-*"
    - rule: IncludePath
      masks: "Packages"
    - rule: KeepLatestVersionNFilesInFolder
      count: 2
      custom_regexp: "bootstrap-ingest-([\\d]+\\.[\\d]+\\-[\\d]+).*\\.rpm"
allburov commented 1 year ago

@t-readyroc I've pushed new version to the same branch If it doesn't work - could you "print" (or just set a breakpoint on that line) the artifactory_with_version content and send it back please?

Make sure to uninstall it before install the updated version

pip uninstall artifactory-cleanup
pip install git+https://github.com/devopshq/artifactory-cleanup.git@keep-latest-n-files-no-dot
artifactory-cleanup --version
# Must show 1.0.4
t-readyroc commented 1 year ago

@allburov - looks like your last update fixed the error! I think what's currently in your branch is going to work for us 😄

t-readyroc commented 1 year ago

I may have spoken too soon - but I think we may be back to a question of rule formatting/use. At the moment, it looks like The following rules are deleting all artifacts from the specified repos. Have I messed up the rule formatting again?

A pair of example rules (one for .noarch.rpm, one for .src.rpm):

- name: REPO-* - Keep 2 most recent versions - bootstrap-ingest.noarch
  rules:                                                                    
    - rule: RepoByMask                                                      
      mask: "REPO-*"                                                 
    - rule: IncludePath                                                     
      masks: "Packages"                                                     
    - rule: KeepLatestVersionNFilesInFolder                                 
      count: 2                                                              
      custom_regexp: "bootstrap-ingest-([\\d]+\\.[\\d]+\\-[\\d]+)\\.noarch\\.rpm"
- name: REPO-* - Keep 2 most recent versions - bootstrap-ingest.src  
  rules:                                                                    
    - rule: RepoByMask                                                      
      mask: "REPO-*"                                                 
    - rule: IncludePath                                                     
      masks: "Packages"                                                     
    - rule: KeepLatestVersionNFilesInFolder                                 
      count: 2                                                              
      custom_regexp: "bootstrap-ingest-([\\d]+\\.[\\d]+\\-[\\d]+)\\.src\\.rpm"

I've just run a test output (no -destroy) to check the files to be deleted, and I've gone through 2 of the 8 release repos. It's looking like the above rules will delete every artifact in the repo - leaving none.

t-readyroc commented 1 year ago

Example output from the application on one of the above rules. 3.10-90 & 3.10-92 are the most recent 2 versions:

DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.10-83.noarch.rpm' - 309M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.10-84.noarch.rpm' - 310M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.10-85.noarch.rpm' - 310M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.10-86.noarch.rpm' - 311M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.10-87.noarch.rpm' - 311M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-10.noarch.rpm' - 181M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-11.noarch.rpm' - 182M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-1.noarch.rpm' - 182M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-20.noarch.rpm' - 191M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-21.noarch.rpm' - 199M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-22.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-24.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-25.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-26.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-28.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-29.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-2.noarch.rpm' - 182M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-30.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-31.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-32.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-33.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-34.noarch.rpm' - 262M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-35.noarch.rpm' - 262M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-36.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-39.noarch.rpm' - 272M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-40.noarch.rpm' - 272M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-41.noarch.rpm' - 272M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-42.noarch.rpm' - 273M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-48.noarch.rpm' - 280M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-49.noarch.rpm' - 275M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-50.noarch.rpm' - 275M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-7.noarch.rpm' - 181M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.8-9.noarch.rpm' - 180M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.9-168.noarch.rpm' - 304M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.9-169.noarch.rpm' - 309M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.9-170.noarch.rpm' - 310M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.9-171.noarch.rpm' - 308M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.9-178.noarch.rpm' - 314M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.9-179.noarch.rpm' - 313M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.9-1.noarch.rpm' - 200M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.10-90.noarch.rpm' - 310M 
DEBUG - we would delete 'REPO-RELEASE-3.10/Packages/bootstrap-ingest-3.10-92.noarch.rpm' - 311M 
t-readyroc commented 1 year ago

Any ideas on this?

Keonik1 commented 1 year ago

@allburov have you considered comparing versions via packaging? In my other project, he worked out correctly with a dash ("-"), but I just don't have enough experience to change something correctly here)

https://stackoverflow.com/questions/11887762/how-do-i-compare-version-numbers-in-python

https://packaging.pypa.io/en/stable/version.html

allburov commented 1 year ago

have you considered comparing versions via packaging?

Actually, the version pattern depends on repository type a lot - debian's repository can have a bit different version pattern than python one, for instance. It's the reason why we added regexp.

Keonik1 commented 1 year ago

@allburov Can you explain why such a regular expression does not work in the project? /test123\/[\d]+.[\d]+\/test123-([\d]+\.[\d]+\.[\d]+-[\d]+)\..+/ (/<regular expression>/ is the same as \\<regular expression>, so the error should not be in this.) reference: https://regex101.com/r/VuKsoN/1

Gives out such an error? I tried both in the release and in the branch from this issue the same way

Warning: Could not identify version for test123/2.0/test123-2.0.10-854.txt
Filter package test123/2.0/test123-2.0.10-854.txt
Warning: Could not identify version for test123/2.0/test123-2.0.10-854.bin
Filter package test123/2.0/test123-2.0.10-854.bin
Warning: Could not identify version for test123/2.0/test123-2.0.10-863.txt
Filter package test123/2.0/test123-2.0.10-863.txt
Warning: Could not identify version for test123/2.0/test123-2.0.10-863.bin
Filter package test123/2.0/test123-2.0.10-863.bin
Warning: Could not identify version for test123/2.1/test123-2.1.1-944.txt
Filter package test123/2.1/test123-2.1.1-944.txt
Warning: Could not identify version for test123/2.1/test123-2.1.1-944.bin
Filter package test123/2.1/test123-2.1.1-944.bin
Warning: Could not identify version for test123/2.1/test123-2.1.1-945.txt
Filter package test123/2.1/test123-2.1.1-945.txt
Warning: Could not identify version for test123/2.1/test123-2.1.1-945.bin
Filter package test123/2.1/test123-2.1.1-945.bin

UPD: I have tried different expressions: /test123\/[\d]+.[\d]+\/test123-([\d]+\.[\d]+\.[\d]+-[\d]+)\..+/ 'test123\/[\d]+.[\d]+\/test123-([\d]+\.[\d]+\.[\d]+-[\d]+)\..+' "test123\\/[\\d]+.[\\d]+\\/test123-([\\d]+\\.[\\d]+\\.[\\d]+-[\\d]+)\\..+"

arossert commented 5 months ago

I have a similar issue, my version pattern looks like 24.1.3-7698, so the - part of the version cannot be handled as it is splitted by . and the 3-7698 cannot be parsed as int.

Is there a planned fix for this or should I create a custom rule?

BTW, replacing the

artifactory_with_version.sort(
    key=lambda x: [int(x) for x in x[0].split(".")]
)

With

artifactory_with_version.sort(
    key=lambda x: packaging.version.Version(x[0])
)

Should do the trick, I don't mind creating a PR if you are intrested