akuity / kargo

Application lifecycle orchestration
https://kargo.akuity.io/
Apache License 2.0
1.62k stars 139 forks source link

Git RepoURL rejected by regex #2835

Open BasJ93 opened 3 hours ago

BasJ93 commented 3 hours ago

Checklist

Description

A warehouse git repo url that worked in v0.7.2 does not work anymore in v1.0.3. The regex does not accept urls that include the username and the @ symbol before the host.

e.g. https://<username>@bitbucket.org/<workspace>/<repo>.git

When attempting to apply a warehouse resource with that definition, kubernetes will reject it based on the regex in the CRD.

When I throw the regex in to regex101.com, it confirms that the regex ((?:^(https?)://(?:([\w-]+):(.+)@)?([\w-]+(?:\.[\w-]+)*)(?::(\d{1,5}))?(/.*)$)|(?:^([\w-]+)@([\w+]+(?:\.[\w-]+)*):(/?.*))) does not match my input. However, if I remove the : between the second and third capture group, my input matches. If I replace it with a | as an or operation, it also matches. It looks like the expectation is that the URL would include the password for the user as well, but (at least for Bitbucket) this is not a requirement.

Modifying the regex to include an or option for just the username, it works in regex101:

(?:^(https?)://(?:([\w-]+):(.+)|([\w-]+)@)?([\w-]+(?:\.[\w-]+)*)(?::(\d{1,5}))?(/.*)$)|(?:^([\w-]+)@([\w+]+(?:\.[\w-]+)*):(/?.*))

Steps to Reproduce

apiVersion: kargo.akuity.io/v1alpha1
kind: Warehouse
metadata:
  name: project-b-warehouse
  namespace: project-b
spec:
  freightCreationPolicy: Automatic
  interval: 5m0s
  subscriptions:
    - image:
        allowTags: \d{1,3}\.\d{1,3}\.\d{5}\.\d+
        discoveryLimit: 20
        imageSelectionStrategy: NewestBuild
        repoURL: tweesnoeken.azurecr.io/smarttwin/smarttwin
    - image:
        allowTags: \d{1,3}\.\d{1,3}\.\d{5}\.\d+
        discoveryLimit: 20
        imageSelectionStrategy: NewestBuild
        repoURL: tweesnoeken.azurecr.io/smarttwin/backgroundworker
    - git:
        branch: master
        commitSelectionStrategy: NewestFromBranch
        discoveryLimit: 20
        repoURL: https://<username>@bitbucket.org/<workspace>/<repo>.git

Version

v1.0.3

Logs

one or more objects failed to apply, reason: error when patching "/dev/shm/4287422194": Warehouse.kargo.akuity.io "project-c-warehouse" is invalid: [spec.subscriptions[3].git.repoURL: Invalid value: "<redacted>": spec.subscriptions[3].git.repoURL in body should match '(?:^(https?)://(?:([\w-]+):(.+)@)?([\w-]+(?:\.[\w-]+)*)(?::(\d{1,5}))?(/.*)$)|(?:^([\w-]+)@([\w+]+(?:\.[\w-]+)*):(/?.*))', status.discoveredArtifacts.git[0].repoURL: Invalid value: "<redacted>": status.discoveredArtifacts.git[0].repoURL in body should match '(?:^(https?)://(?:([\w-]+):(.+)@)?([\w-]+(?:\.[\w-]+)*)(?::(\d{1,5}))?(/.*)$)|(?:^([\w-]+)@([\w+]+(?:\.[\w-]+)*):(/?.*))']
krancour commented 3 hours ago

That regex has been modified extensively between then and now to support ssh URLs. It seems that in the refactoring of that regex, we kept the possibility of (for instance) https://user:pass@host, but lost the possibility of https://user@host

This is easy enough to fix, but I am curious about why you'd be using a URL formatted that way. The username included in that URL plays no role in how Kargo would actually authenticate to that repository.

On some level, I feel more inclined to tighten this regex than to relax it as username alone, as I mentioned doesn't add much value here and username:password in a plaintext URL is playing with fire.

It will be helpful to understand why your URL is formatted like that.