Closed Nusserdt closed 4 years ago
It's normal cartservice may crash a few times before it's ready (since it has a loose dependency on redis).
However this error is concerning:
./loadgen.sh: 21: ./loadgen.sh: [[: not found
it seems like the underlying image somehow changed under us. https://github.com/GoogleCloudPlatform/microservices-demo/blob/master/src/loadgenerator/Dockerfile#L1 Does python:3-slim no longer have [[
executable (but it somehow has bash
to execute the loadgen.sh)?
@DanSanche would be great if you have time to take a look.
Hmm I wasn't able to reproduce this. I saw some crashes on those two services early on, but they stabilized after ~2mins.
@ahmetb Looking through the logs, it looks like ./loadgen.sh: 21: ./loadgen.sh: [[: not found
is a red herring. That error always shows up, even when the service is working properly. It looks like loadgen.sh is actually run with #!/bin/sh -eu
on line 1 of the file, not the #!/bin/bash
that comes on like 17. I can try to fix that soon
@Nusserdt I'm not sure why you're having trouble, cartservice should definitely start working within the 16 mins you waited for. Is there anything related to your network that could cause communication issues between those services? What have you tried to debug the issue?
#!/bin/sh -eu
on line 1 of the file, not the#!/bin/bash
that comes on like 17. I can try to fix that soon
yes this sounds like the culprit.
@DanSanche our environment is behind a proxy, we try to add relevant proxy information. But I can confirm that cartservice
defently fail to start:
NAME READY STATUS RESTARTS AGE
adservice-55f9757757-6js8m 1/1 Running 0 151m
cartservice-684bb46b44-g4vwt 0/1 CrashLoopBackOff 47 151m
checkoutservice-6fcc84467f-24jjh 1/1 Running 0 151m
currencyservice-6c7c479d45-zn5bk 1/1 Running 0 151m
emailservice-8dd9b76cc-jwcmx 1/1 Running 0 151m
frontend-7d8cfc75b5-sw7th 1/1 Running 0 151m
loadgenerator-5db67d555-l29l6 0/1 CrashLoopBackOff 32 151m
paymentservice-84ffc75c55-8jlqj 1/1 Running 0 151m
productcatalogservice-d564bdf4c-zz8rh 1/1 Running 0 151m
recommendationservice-76598d5889-gb4kt 1/1 Running 0 151m
redis-cart-5f59546cdd-b5m6p 1/1 Running 0 151m
shippingservice-b6db65f7f-54968 1/1 Running 0 151m
I have also the problem that the frontend
service returns pending for the external-ip. Is think this is related to the failing loadgenerator
service?
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
frontend-external LoadBalancer 10.104.218.246 <pending> 80:31248/TCP 152m
Is think this is related to the failing
loadgenerator
service?
most likely this is about the cloud provider you're using (what are you using?). run kubectl describe
to see if there are any failure events there. on minikube etc, this is not supposed to work.
@ahmetb We host the cluster locally on 3 debian machines. 1 master, 2 nodes. Could you specify which resouce I should use kubectl describe
.
NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings true Binding
componentstatuses cs false ComponentStatus
configmaps cm true ConfigMap
endpoints ep true Endpoints
events ev true Event
limitranges limits true LimitRange
namespaces ns false Namespace
nodes no false Node
persistentvolumeclaims pvc true PersistentVolumeClaim
persistentvolumes pv false PersistentVolume
pods po true Pod
podtemplates true PodTemplate
replicationcontrollers rc true ReplicationController
resourcequotas quota true ResourceQuota
secrets true Secret
serviceaccounts sa true ServiceAccount
services svc true Service
mutatingwebhookconfigurations admissionregistration.k8s.io false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io false ValidatingWebhookConfiguration
customresourcedefinitions crd,crds apiextensions.k8s.io false CustomResourceDefinition
apiservices apiregistration.k8s.io false APIService
controllerrevisions apps true ControllerRevision
daemonsets ds apps true DaemonSet
deployments deploy apps true Deployment
replicasets rs apps true ReplicaSet
statefulsets sts apps true StatefulSet
tokenreviews authentication.k8s.io false TokenReview
localsubjectaccessreviews authorization.k8s.io true LocalSubjectAccessReview
selfsubjectaccessreviews authorization.k8s.io false SelfSubjectAccessReview
selfsubjectrulesreviews authorization.k8s.io false SelfSubjectRulesReview
subjectaccessreviews authorization.k8s.io false SubjectAccessReview
horizontalpodautoscalers hpa autoscaling true HorizontalPodAutoscaler
cronjobs cj batch true CronJob
jobs batch true Job
certificatesigningrequests csr certificates.k8s.io false CertificateSigningRequest
leases coordination.k8s.io true Lease
endpointslices discovery.k8s.io true EndpointSlice
events ev events.k8s.io true Event
ingresses ing extensions true Ingress
ingresses ing networking.k8s.io true Ingress
networkpolicies netpol networking.k8s.io true NetworkPolicy
runtimeclasses node.k8s.io false RuntimeClass
poddisruptionbudgets pdb policy true PodDisruptionBudget
podsecuritypolicies psp policy false PodSecurityPolicy
clusterrolebindings rbac.authorization.k8s.io false ClusterRoleBinding
clusterroles rbac.authorization.k8s.io false ClusterRole
rolebindings rbac.authorization.k8s.io true RoleBinding
roles rbac.authorization.k8s.io true Role
priorityclasses pc scheduling.k8s.io false PriorityClass
csidrivers storage.k8s.io false CSIDriver
csinodes storage.k8s.io false CSINode
storageclasses sc storage.k8s.io false StorageClass
volumeattachments storage.k8s.io false VolumeAttachment
We try to fix loadgen.sh
by our self. What we don't understand is: how to apply changes to the pods? Do we have to execute skaffold run
to "rebuild" the deployment? Unfortunately skaffold run
throws also errors like:
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Igooglecloudprofiler/src -I/usr/local/include/python3.7m -c googlecloudprofiler/src/log.cc -o build/temp.linux-x86_64-3.7/googlecloudprofiler/src/log.o -std=c++11
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
in [emailservice] Step: 5/14
well looks like I get the same error on cartservice crash...
"cartservice-684bb46b44-2l9bd 0/1 Completed 0 11s
cartservice-684bb46b44-2l9bd 0/1 Running 1 12s
cartservice-684bb46b44-2l9bd 0/1 Completed 1 18s
cartservice-684bb46b44-2l9bd 0/1 CrashLoopBackOff 1 24s
cartservice-684bb46b44-2l9bd 0/1 Running 2 36s
cartservice-684bb46b44-2l9bd 0/1 Completed 2 45s
cartservice-684bb46b44-2l9bd 0/1 CrashLoopBackOff 2 54s
cartservice-684bb46b44-2l9bd 0/1 Running 3 69s
cartservice-684bb46b44-2l9bd 0/1 Completed 3 76s
cartservice-684bb46b44-2l9bd 0/1 CrashLoopBackOff 3 84s
cartservice-684bb46b44-2l9bd 0/1 Running 4 2m4s
cartservice-684bb46b44-2l9bd 0/1 Completed 4 2m11s
cartservice-684bb46b44-2l9bd 0/1 CrashLoopBackOff 4 2m14s
^C[root@node1 ~]# kubectl logs cartservice-684bb46b44-2l9bd
Started as process with id 1
Reading host address from LISTEN_ADDR environment variable
Reading cart service port from PORT environment variable
Reading redis cache address from environment variable REDIS_ADDR
Connecting to Redis: redis-cart:6379,ssl=false,allowAdmin=true,connectRetry=5
StackExchange.Redis.RedisConnectionException: It was not possible to connect to the redis server(s). UnableToConnect on redis-cart:6379/Interactive, Initializing/NotStarted, last: NONE, origin: BeginConnectAsync, outstanding: 0, last-read: 1s ago, last-write: 1s ago, keep-alive: 180s, state: Connecting, mgr: 10 of 10 available, last-heartbeat: never, global: 3s ago, v: 2.0.601.3402
at StackExchange.Redis.ConnectionMultiplexer.ConnectImpl(Object configuration, TextWriter log) in C:\projects\stackexchange-redis\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 955
at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /app/cartstore/RedisCartStore.cs:line 80
at cartservice.cartstore.RedisCartStore.InitializeAsync() in /app/cartstore/RedisCartStore.cs:line 60
at cartservice.Program.<>c__DisplayClass4_0.<
Looks like cartservice endpoint is empty.
kubectl get endpoints NAME ENDPOINTS AGE adservice 10.233.92.12:9555 24m apache2 10.233.90.35:80,10.233.96.41:80 19h blue 10.233.90.41:5000,10.233.96.35:5000 5h41m cartservice 24m checkoutservice 10.233.96.36:5050
@Nusserdt Yes, skaffold run
should rebuild and run the containers. Are you doing the building on debian as well? What version? Do you use Docker often? Does it usually give you issues like this? It seems strange you're having issues building the container. Docker is supposed to fix exactly these "it works on my machine" issues
@itlinux the logs look like you were waiting for only 2 minutes. Did you try letting it run a little longer? It's currently expected that the cartservice will crash a couple times until the redis service is completely ready. I may look into fixing this at some point soon
@Nusserdt Also, FWIW I don't think there are any issues with the load generator. These errors are consistent with redis not being ready. Can you post the logs from the redis pod?
I had it running overnight and still no go. So I removed it.
Il giorno 22 gen 2020, alle ore 10:16, Daniel Sanche notifications@github.com ha scritto:
@Nusserdt what platform are you using? Windows? Do you use Docker often? Does it usually give you issues like this? It seems strange you're having issues building the container. Docker is supposed to fix exactly these "it works on my machine" issues
@itlinux the logs look like you were waiting for only 2 minutes. Did you try letting it run a little longer? It's currently expected that the cartservice will crash a couple times until the redis service is completely ready. I may look into fixing this at some point soon
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I've just deployed to GKE with kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml
and I can't repro this crashloop of cartservice
.
And since we changed nothing, I'm suspecting something is wrong on your side @Nusserdt. Could there be issues in your cluster networking? Since you said:
We host the cluster locally on 3 debian machines.
I'm suspecting this is a setup issue. If the project is working on GKE and locally on Docker-for-Desktop's Kubernetes (or Minikube) there's likely nothing we can do here.
That said we should fix the loadgen startup error ./loadgen.sh: 21: ./loadgen.sh: [[: not found
That said we should fix the loadgen startup error ./loadgen.sh: 21: ./loadgen.sh: [[: not found
This issue should be fixed in https://github.com/GoogleCloudPlatform/microservices-demo/pull/284
@DanSanche we use skaffold v1.1.0 and it likly don't work cause I am not able to pass our proxy inforamtion. With docker I can configure ~/.docker/config.json
and after that docker build
works like a charm. We run the build also on the dibian 10 (master) machine.
Now, I uncomment the lines which was resonable for the error:
#if [[ -z "${FRONTEND_ADDR}" ]]; then
# echo >&2 "FRONTEND_ADDR not specified"
# exit 1
#fi
and update the Docker Image, push it to our regestry and replace the entry inside the kubernetes-manifests.yaml
.
But the loadgenerator pod still failling:
NAME READY STATUS RESTARTS AGE
adservice-55f9757757-j9mhs 1/1 Running 1 19h
cartservice-684bb46b44-f8s6b 0/1 CrashLoopBackOff 329 19h
checkoutservice-6fcc84467f-x8cp6 1/1 Running 1 19h
currencyservice-6c7c479d45-pklv5 1/1 Running 1 19h
emailservice-8dd9b76cc-8lx7j 1/1 Running 1 19h
frontend-7d8cfc75b5-tzp9h 1/1 Running 1 19h
loadgenerator-76875cfd5f-kn5m5 0/1 CrashLoopBackOff 6 31m
paymentservice-84ffc75c55-vlb6j 1/1 Running 1 19h
productcatalogservice-d564bdf4c-kqb27 1/1 Running 1 19h
recommendationservice-76598d5889-cdsxs 1/1 Running 1 19h
redis-cart-5f59546cdd-fzxdv 1/1 Running 2 19h
shippingservice-b6db65f7f-l9blv 1/1 Running 1 19h
Now, the log only returns:
++ curl --silent --output /dev/stderr --write-out '%{http_code}' http://frontend:80
+ STATUSCODE=000
How can I investigate what goes wrong here?
@ahmetb our cluster configuration looks like:
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://192.168.76.101:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
Could the flannel-Framework a problem here (https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml)? What else could go wrong? :/
The reason loadgenerator and cartservice are crashing is because they're having trouble communicating with the redis-cart service. My guess is that it's due to your proxy - there may be some network setting in your environment that is preventing those services from communicating.
My advice would be to try to do some debugging with kubectl port-forward
and kubectl exec
to try to get to the bottom of it, but I'm not able to reproduce your issue, so I likely won't be much help here.
Could the flannel-Framework a problem here (coreos/flannel:Documentation/kube-flannel.yml@
master
(raw))?
Yes, that's why we don't have bandwidth to support a custom setup. :) If it's works on Minikube and GKE, that's likely a setup issue you have that I recommend you seek help in other channels.
Closing as we can't do much here.
I follow the installation steps from option 3. I had already a running kubernetes cluster. So I only execute:
kubectl apply -f ./release/kubernetes-manifests.yaml
When I evaluate the result with
kubectl get pods
I get:cartservice
andloadgenerator
are not able to start.Logs
kubectl logs cartservice-684bb46b44-b6dvk
kubectl logs loadgenerator-5db67d555-fq42k
Machine
Debian 10 behind a Proxy Local Hosting