New images not deployed

qodfathr commented 5 years ago

V1. 10.1

Loving this project - kudos to the team! I'm using docker hub build for autobuilds, and semver deployments have been working well for months. About 5 days ago was the last auto image deployment by Flux. 'list-images' is showing image versions/tags from that time period, but no new images. Not sure what to do next or how to diagnose further. I can see in the logs that git syncs are still happening, but the registry is out of date. Suggestions? (reg is also docker hub of not clear)

hiddeco commented 5 years ago

Thanks for including your version number and your clear explanation, this helps a lot triaging the issue. Couple of things that come to mind:

What is the current memory usage of your memcache pod? If it (almost) equals the set -m argument on the memcache deployment, Flux may be overflowing the memory, resulting in a unreliable image set of images for Flux to update from. Increasing the memory (by setting the -m argument to a higher value) should resolve the issue.
Is Flux still reporting entries like caller=warming.go:364 component=warmer updated=quay.io/weaveworks/flux successful=1 attempted=0 in the logs? If not, - somehow the polling process became stale - you could try deleting the pod to respawn all internal polling processes. Although we would be interested in why it stopped looking for image updates.
In case the Flux logs report things like SERVER_ERROR object too large for cache, the -I argument on memcache must be increased. Flux stores all the metadata for one image repository in one item, each image tag takes up about 1000 bytes in size. This means -I 1m would be a able to store about 1000 tags for one image repository, if the amount of tags you have for one image is >=1000 you should increase the -I size.

On an additional note; if you have the feeling it would help to have more interactive debugging with some help from us: please join our Slack channel #flux. You can request an invite via https://slack.weave.works, once there you can ping reach / ping me: @hidde.

qodfathr commented 5 years ago

It may be memory related...let me start there...will report back. I appreciate the detailed help! Will hit Slack if I get stuck.

qodfathr commented 5 years ago

It was memcached. Thanks for the help! All good now.

fluxcd / flux

New images not deployed #1880