aoldershaw / git-branches-resource

Tracks the set of branches that exist in a git repository
MIT License
9 stars 9 forks source link

Branches resource triggers pipeline every hour #6

Open grzleadams opened 1 year ago

grzleadams commented 1 year ago

I'm seeing strange behavior with this resource (using the latest image from Docker Hub), but I can't tell if the problem is the resource type itself or the Concourse (7.7.1) infrastructure, so I figured I'd reach out to see if anyone has seen similar behavior. I'm using this resource type in a standard parent/child pipeline setup, where the across step is used in a parent pipeline to set a child pipeline per branch in a repo.

The problem is that every hour (despite the check running about every minute), the branches resource sees a change due to the timestamp updating and triggers jobs accordingly, even though the branch list itself never actually changes. None of the checks that occur between this hourly "change" show an updated timestamp.

This obviously causes a lot of unnecessary pipeline runs, but I can't see any reason why the timestamp is updating at all, much less why it only happens every hour (exactly). I'd really appreciate any thoughts you might have on this.

saj commented 1 year ago

I encountered some similarish problems, though I'm unsure whether we share the same set of root causes. There were two in my case:

Upon any net I/O error from git(1), the resource type would silently discard the error and emit an empty branch list.

result = subprocess.run(['git', 'ls-remote', '--heads', uri], stdout=subprocess.PIPE)

This was easily fixed, so I patched it and moved on.

Later, I noticed that (almost) every check was emitting a new timestamp -- even with a stable branch set. (Heads were stable too; no updates.) Curiously, though, I could not always reproduce this effect if I mashed the check resource button in quick succession.

https://github.com/concourse/concourse/issues/3910

We pull our patched fork of this resource type from an internal registry, not from Docker Hub. Our internal registry requires authentication. Secrets are supplied dynamically and rotate automatically with the passage of time. I think, because of the aforelinked Concourse quirk, each new dynamic secret nukes all past resource versions, and the check is invoked with a missing version in the input body. When this happens, the check has no choice but to generate a new timestamp. (Quickly mashing the check resource button would reuse a cached, still valid, authentication secret.)

Instead of a timestamp, perhaps it would be better to hash the ref IDs and emit a digest in place of the timestamp.

$ git ls-remote http://www.kernel.org/pub/scm/git/git.git
5fe978a5381f1fbad26a80e682ddd2a401966740        refs/heads/master
c781a84b5204fb294c9ccc79f8b3baceeb32c061        refs/heads/seen
^
hash this stuff in some deterministic order

I shall experiment with this.

It sounds like you have a working resource version history, though, so perhaps this won't help in your case. :(