Closed juan-lee closed 5 years ago
Still getting the Issue: 2018-11-12T07:49:47.964622Z error Unable to decode an event from the watch stream: stream error: stream ID 1529; INTERNAL_ERROR 2018-11-12T07:49:47.964882Z error Unable to decode an event from the watch stream: stream error: stream ID 1519; INTERNAL_ERROR
Do I need to reinstall Istio for this fix to work?
Still getting the Issue: 2018-11-12T07:49:47.964622Z error Unable to decode an event from the watch stream: stream error: stream ID 1529; INTERNAL_ERROR 2018-11-12T07:49:47.964882Z error Unable to decode an event from the watch stream: stream error: stream ID 1519; INTERNAL_ERROR
Do I need to reinstall Istio for this fix to work? try to only restart/delete the envoy ingress gateway and reboot/delete your pods check the logs again in ingress controller.
Hi,
Do we need to do anything for our cluster? Or it will be applied automatically?
Br, Tim
I understood:
@strtdusty It's a feature flag they set on AKS. For now only applied to new clusters. Not sure if the feature can be disabled manually.
So you'll have to recreate the cluster.
@adinunzio84 You mentioned that with this fix there may need to be an additional ServiceEntry needed for Istio. We are having the issue described and also have Istio installed so I am curious if you have the fix and what ServiceEntry you had to add?
b
Hi,
Do we need to do anything for our cluster? Or it will be applied automatically?
Br, Tim
The fix is no longer behind a feature flag. All new clusters will get it automatically. Existing clusters will need to do a scale or upgrade to get the fix.
@juan-lee Thanks for your reply, we rebuilt our cluster yesterday, however, I still can see this problem this morning. We are in westeurope, and the cluster version is 1.11.3
Br, Tim
Hi, i redeployed with terraform in westeurope and centralus region. with nginx ingress controller. I can confirm from my side that the messages are gone from the log file. i used version 1.11.2.
thanks for the fix.
@juan-lee Thanks for your reply, we rebuilt our cluster yesterday, however, I still can see this problem this morning. We are in westeurope, and the cluster version is 1.11.3
Br, Tim
Can you provide some more details? Does the pod in question have the appropriate KUBERNETES_ environment variables set.
@mmosttler Not fully tested, but this kinda works:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: azmk8s-ext
namespace: default
spec:
hosts:
- ${FQDN}
location: MESH_EXTERNAL
ports:
- name: https
number: 443
protocol: HTTPS
resolution: DNS
I did cat serviceentry.yml | envsubst | kubectl apply -f -
with FQDN=$(az aks show -n ${CLUSTER_NAME} -g ${CLUSTER_RG} --query "fqdn" --output tsv)
This is the closest I got to it working. The javascript Kubernetes client is happy with this, but the Go client is not.
Please let me know if you come up with something better though
I can confirm that the problem is gone from our clusters. (west EU)
thanks for reporting back. i'm closing the issue for now
I am still seeing this issue even this week in east us.
I am still seeing this issue even this week in east us.
Can you elaborate on your scenario? Also, keep in mind that pods will need to be restarted in order to get the fix. You can check to see if a pod has the fix by seeing if KUBERNETES_PORT, etc env variables are set for each container.
Symptoms Pods using the in cluster config to perform a watch on a resource will see intermittent timeouts and the following error in the pod log.
streamwatcher.go:109] Unable to decode an event from the watch stream: stream error: stream ID 1; INTERNAL_ERROR
If the client performing the watch isn't handling errors gracefully, applications can get into an inconsistent state. Impacted applications include, but are not limited to, nginx-ingress and tiller (helm).
A specific manifestation of this bug is the following error when attempting a helm deployment.
Error: watch closed before Until timeout
Root Cause
Workaround For the pods/containers that see the
INTERNAL_ERROR
in their logs add the following environment variables to the container spec. Be sure to replace<your-fqdn-prefix>
and<region>
so the aks kube-apiserver FQDN is correct.