coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

Properly handling metadata cache at release time #232

Open jlebon opened 5 years ago

jlebon commented 5 years ago

Metadata files like streams/testing.json and releases.json are at stable URLs and updated in place during releases. https://builds.coreos.fedoraproject.org/ is handled by CloudFront, so we have to think about caching.

We've been working around this for stream metadata by just using --cache-control max-age=0 when uploading, but that's clearly not ideal. We do want caching, just smarter...

I think what we want is the ability to invalidate the CloudFront cache as part of our release process? Something like:

aws cloudfront create-invalidation --paths "/streams/testing.json /prod/streams/testing/releases.json"
jlebon commented 5 years ago

I think what we want is the ability to invalidate the CloudFront cache as part of our release process?

@puiterwijk, does that sound reasonable? If so, I can submit a releng request to get the creds to do this.

cgwalters commented 5 years ago

I think it's the more normal pattern to have any "mutable" objects correctly set their caching headers, and have the mutable bits (usually metadata) refer to immutable URLs that can be cached forever.

That's what I did for the RHCOS pipeline anyways.

cgwalters commented 5 years ago

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html#expiration-individual-objects

cgwalters commented 5 years ago

Link to RHCOS pipeline code: https://url.corp.redhat.com/1459bbd

jlebon commented 5 years ago

Right, specifically talking about the mutable bits here. We can definitely just settle on some small interval greater than 0, but would be nice if we could do even better. IIRC I think Flathub does something similar for its summary file? (@ramcq, does that sound correct?).

jlebon commented 5 years ago

Hmm, this actually also intersects with Cincinnati. We were discussing having rollouts controlled through files in the bucket. And so starting a rollout and pausing a rollout would require editing a file. If we want Cincinnati to pick up those changes quickly, then we'll have to use e.g. max-age=0 or just pointing it at the bucket directly or explicitly invalidating it.

I think it's the more normal pattern to have any "mutable" objects correctly set their caching headers, and have the mutable bits (usually metadata) refer to immutable URLs that can be cached forever.

I don't want to lose this bit though. I think we could make buildupload set better default cache headers when uploading as appropriate for each file. Will file something for this.

jlebon commented 5 years ago

OK, opened https://github.com/coreos/coreos-assembler/pull/680 and https://github.com/coreos/mantle/pull/1038, so at least we'll have more sensible caching for now.

ramcq commented 4 years ago

Sorry for delay; just to provide the context here. We set TTLs as follows - https://github.com/flathub/ansible-playbook/blob/master/roles/repo-manager/templates/nginx-default.d-repo.conf.j2. For the mutable parts of the repo - the summary and its signature, and the refs, we set a shorter timeout with a longer "stale if error" retention period in case of temporary origin wobbles.

We used to have a very short TTL on the summary file until we realised that about 20-30% of the origin traffic was refreshing the summary on every edge node every minute. So we moved to a 1hr TTL and an explicit cache PURGE when the summary was updated. https://github.com/flathub/ansible-playbook/blob/master/roles/repo-manager/templates/post-publish.sh.j2#L21-L23 (very high-tech this bit)

dustymabe commented 4 years ago

@jlebon - is there outstanding work to do here?