Closed stillinbeta closed 5 years ago
How did you install flux? Can you post the manifests here pls (redacted if necessary)?
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: flux
spec:
replicas: 1
selector:
matchLabels:
name: flux
strategy:
type: Recreate
template:
metadata:
annotations:
prometheus.io.port: "3031" # tell prometheus to scrape /metrics endpoint's port.
labels:
name: flux
spec:
serviceAccountName: flux
volumes:
- name: git-key
secret:
secretName: flux-git-deploy
defaultMode: 0400 # when mounted read-only, we won't be able to chmod
# This is a tmpfs used for generating SSH keys. In K8s >= 1.10,
# mounted secrets are read-only, so we need a separate volume we
# can write to.
- name: git-keygen
emptyDir:
medium: Memory
# The following volume is for using a customised known_hosts
# file, which you will need to do if you host your own git
# repo rather than using github or the like. You'll also need to
# mount it into the container, below. See
# https://github.com/weaveworks/flux/blob/master/site/standalone-setup.md#using-a-private-git-host
# - name: ssh-config
# configMap:
# name: flux-ssh-config
# The following volume is for using a customised .kube/config,
# which you will need to do if you wish to have a different
# default namespace. You will also need to provide the configmap
# with an entry for `config`, and uncomment the volumeMount and
# env entries below.
# - name: kubeconfig
# configMap:
# name: flux-kubeconfig
containers:
- name: flux
# There are no ":latest" images for flux. Find the most recent
# release or image version at https://quay.io/weaveworks/flux
# and replace the tag here.
image: quay.io/weaveworks/flux:1.11.0
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 50m
memory: 64Mi
ports:
- containerPort: 3030 # informational
volumeMounts:
- name: git-key
mountPath: /etc/fluxd/ssh # to match location given in image's /etc/ssh/config
readOnly: true # this will be the case perforce in K8s >=1.10
- name: git-keygen
mountPath: /var/fluxd/keygen # to match location given in image's /etc/ssh/config
# Include this if you need to mount a customised known_hosts
# file; you'll also need the volume declared above.
# - name: ssh-config
# mountPath: /root/.ssh
# Include this and the volume "kubeconfig" above, and the
# environment entry "KUBECONFIG" below, to override the config
# used by kuebctl.
# - name: kubeconfig
# mountPath: /etc/fluxd/kube
# Include this to point kubectl at a different config; you
# will need to do this if you have mounted an alternate config
# from a configmap, as in commented blocks above.
# env:
# - name: KUBECONFIG
# value: /etc/fluxd/kube/config
args:
# if you deployed memcached in a different namespace to flux,
# or with a different service name, you can supply these
# following two arguments to tell fluxd how to connect to it.
# - --memcached-hostname=memcached.default.svc.cluster.local
# use the memcached ClusterIP service name by setting the
# memcached-service to string empty
- --memcached-service=
# this must be supplied, and be in the tmpfs (emptyDir)
# mounted above, for K8s >= 1.10
- --ssh-keygen-dir=/var/fluxd/keygen
# replace or remove the following URL
- --git-url=git@github.com:stillinbeta/leckie.git
- --git-branch=master
# include these next two to connect to an "upstream" service
# (e.g., Weave Cloud). The token is particular to the service.
# - --connect=wss://cloud.weave.works/api/flux
# - --token=abc123abc123abc123abc123
# serve /metrics endpoint at different port.
# make sure to set prometheus' annotation to scrape the port value.
- --listen-metrics=:3031
Pretty much a verbose copy from the weaveworks/flux, just added my repository
Thanks for that -- looks OK to me, that config for fluxd. Just to check, did you create the service for memcached, in the same namespace, and does it show up in kubectl get svc
?
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.245.0.1 <none> 443/TCP 7d19h
memcached ClusterIP 10.245.117.136 <none> 11211/TCP 7d19h
they're both in the default
namespace
I see the fluxd pod has been running for 16h, and at least the service was create about a week ago -- was this working for a while and it stopped, or did it never work?
If it worked at one point then stopped, my suspicion would rest on Kubernetes' name resolution breaking down. You can test that by exec'ing into the fluxd container and seeing if it can nslookup memcached
.
$ kubectl exec -ti flux-fd7f478d7-4bks6 -- /bin/sh
# nslookup memcached
Sometimes restarting the fluxd pod fixes this. In general it's safe to restart the fluxd pod, since the only state that matters is what's in your git repo and/or the cluster.
If it never worked, I'd lean towards a configuration problem. Though I'd struggle to say where -- from what you've posted, it looks fine to me :-/
$ kubectl exec -ti flux-fd7f478d7-4bks6 -- /bin/sh
/home/flux # nslookup memcached
nslookup: can't resolve '(null)': Name does not resolve
Name: memcached
Address 1: 10.245.117.136 memcached.default.svc.cluster.local
/home/flux # nc 10.245.117.136 11211
stats
STAT pid 1
STAT uptime 3072
STAT time 1554738423
STAT version 1.4.25
STAT libevent 2.0.21-stable
<snip>
so the connection seems to be fine. Restarting flux does appear to have fixed it, but that's not a particularly satisfying solution. I've had this problem occur once before already. Any ideas what could be causing it?
Restarting flux does appear to have fixed it, but that's not a particularly satisfying solution. I've had this problem occur once before already. Any ideas what could be causing it?
There have been hints (and this is a good one) that fluxd doesn't recover well from losing its memcached connection, or from being initially unable to resolve the service's hostname.
Mind if I treat this as the definitive bug report for this particular problem?
Not at all! Let me know if I can be of any help!
On Mon, 8 Apr 2019 at 12:13, Michael Bridgen notifications@github.com wrote:
Restarting flux does appear to have fixed it, but that's not a particularly satisfying solution. I've had this problem occur once before already. Any ideas what could be causing it?
There have been hints (and this is a good one) that fluxd doesn't recover well from losing its memcached connection, or from being initially unable to resolve the service's hostname.
Mind if I treat this as the definitive bug report for this particular problem?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/weaveworks/flux/issues/1907#issuecomment-480898593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf9cHBU9vbxh_ATv6ca63e-WZxy_klPks5ve2qugaJpZM4cibpR .
This looks very similar to the experience I had confirmed in #1766
This looks very similar to the experience I had confirmed in #1766
Yes; it looks like the cause and the symptom may well be the same.
I can provoke the particular log message reported above by making sure CoreDNS is not available when fluxd starts up (and then at some point shortly after, is available). For example, in a local minikube instance which is operating normally,
kubectl scale -n kube-system deploy/coredns --replicas=0
kubectl delete pod -l name=flux
kubectl scale -n kube-system deploy/coredns --replicas=2
kubectl log deploy/flux -f
This will break things fairly reliably, though I'm sure there's a particular window in which the CoreDNS outage will cause the problem in question and not others.
One probable reason for the problem is that the memcache client resolves hostnames when it starts up. My suspicion is that if it fails to do so, it'll nonetheless continue with an empty pool of addresses, which will never be populated. Thus memcache: no servers configured or available
.
A fix would be to periodically reset the hostnames provided, so they will be resolved again. This will fight the memcache client's connection pooling, but I am not that concerned about the overhead of re-establishing connections, for our purposes.
The fix (#1913) should appear in a patch release fairly soon. If you are willing to try it out, the image built from master branch, quay.io/weaveworks/flux:master-bcf0f543
, includes it.
Memcached appears to be running:
But Flux seemingly can't connect:
I've tried destroying the memcached and upgrading flux, but no dice so far.
Running on digital ocean, version 13.5