Closed attilajanko closed 4 years ago
Maybe the whole log for the pod can be helpful
`2020-05-07 09:36:22 kubewatch [9 TMainThread] 1.4.2 DEBUG: looking up ID for namespace default
2020-05-07 09:36:22 kubewatch [9 TMainThread] 1.4.2 DEBUG: cluster ID URL is d6e_id://860a3871-2493-11e9-9b08-3612afa7c1ba/default
2020-05-07 09:36:23 kubewatch [9 TMainThread] 1.4.2 DEBUG: IngressClass check got 404
2020-05-07 09:36:23 kubewatch [9 TMainThread] 1.4.2 DEBUG: Ambassador does not have permission to read IngressClass resources. To enable IngressClass support, configure RBAC to allow Ambassador to read IngressC
lass resources, then restart the Ambassador pod.
2020-05-07 09:36:23 kubewatch [9 TMainThread] 1.4.2 DEBUG: CRD type definition not found for ambassadorinstallations.getambassador.io
2020-05-07 09:36:23 kubewatch [9 TMainThread] 1.4.2 DEBUG: ambassadorinstallations.getambassador.io CRD not available
2020-05-07 09:36:23 kubewatch [9 TMainThread] 1.4.2 DEBUG: cluster ID is 2a49b558-5f94-528e-b62d-10a0adb6457d (from namespace default)
2020-05-07 09:36:23 AMBASSADOR INFO starting with environment:
2020-05-07 09:36:23 AMBASSADOR INFO ====
AMBASSADOR_ADMIN_PORT=tcp://192.168.171.146:8877
AMBASSADOR_ADMIN_PORT_8877_TCP=tcp://192.168.171.146:8877
AMBASSADOR_ADMIN_PORT_8877_TCP_ADDR=192.168.171.146
AMBASSADOR_ADMIN_PORT_8877_TCP_PORT=8877
AMBASSADOR_ADMIN_PORT_8877_TCP_PROTO=tcp
AMBASSADOR_ADMIN_SERVICE_HOST=192.168.171.146
AMBASSADOR_ADMIN_SERVICE_PORT=8877
AMBASSADOR_ADMIN_SERVICE_PORT_AMBASSADOR_ADMIN=8877
AMBASSADOR_ADMIN_URL=http://127.0.0.1:8877
AMBASSADOR_CLUSTER_ID=2a49b558-5f94-528e-b62d-10a0adb6457d
AMBASSADOR_CONFIG_BASE_DIR=/ambassador
AMBASSADOR_INTERNAL_URL=https://127.0.0.1:8443
AMBASSADOR_NAMESPACE=core
AMBASSADOR_PORT=tcp://192.168.18.40:80
AMBASSADOR_PORT_80_TCP=tcp://192.168.18.40:80
AMBASSADOR_PORT_80_TCP_ADDR=192.168.18.40
AMBASSADOR_PORT_80_TCP_PORT=80
AMBASSADOR_PORT_80_TCP_PROTO=tcp
AMBASSADOR_REDIS_PORT=tcp://192.168.221.1:6379
AMBASSADOR_REDIS_PORT_6379_TCP=tcp://192.168.221.1:6379
AMBASSADOR_REDIS_PORT_6379_TCP_ADDR=192.168.221.1
AMBASSADOR_REDIS_PORT_6379_TCP_PORT=6379
AMBASSADOR_REDIS_PORT_6379_TCP_PROTO=tcp
AMBASSADOR_REDIS_SERVICE_HOST=192.168.221.1
AMBASSADOR_REDIS_SERVICE_PORT=6379
AMBASSADOR_SERVICE_HOST=192.168.18.40
AMBASSADOR_SERVICE_PORT=80
AMBASSADOR_SERVICE_PORT_HTTP=80
AMBASSADOR_URL=https://ambassador.default.svc.cluster.local
2020-05-07 09:36:23 AMBASSADOR INFO ====
2020-05-07 09:36:23 AMBASSADOR INFO launching worker process 'ambex': 'ambex' '-ads' '8003' '/ambassador/envoy'
2020-05-07 09:36:23 AMBASSADOR INFO ambex is PID 195
2020-05-07 09:36:23 AMBASSADOR INFO launching worker process 'diagd': 'diagd' '/ambassador/snapshots' '/ambassador/bootstrap-ads.json' '/ambassador/envoy/envoy.json' '--notices' '/ambassador/notices.json' '--ki
ck' 'kill -HUP 1'
2020-05-07 09:36:23 AMBASSADOR INFO diagd is PID 203
time="2020-05-07T09:36:23Z" level=info msg="Ambex 1.4.2 starting..."
time="2020-05-07T09:36:23Z" level=info msg=Listening port=8003
time="2020-05-07T09:36:23Z" level=info msg="Wrote PID" file=ambex.pid pid=195
time="2020-05-07T09:36:23Z" level=info msg="Pushing snapshot v0"
2020-05-07 09:36:25 diagd 1.4.2 [P203TMainThread] INFO: thread count 17, listening on 0.0.0.0:8877
2020-05-07 09:36:25 diagd 1.4.2 [P203TMainThread] INFO: BOOT: Scout result {"latest_version": "1.0.0", "application": "aes", "cached": false, "timestamp": 1588844185.682892}
[2020-05-07 09:36:25 +0000] [203] [INFO] Starting gunicorn 19.9.0
[2020-05-07 09:36:25 +0000] [203] [INFO] Listening at: http://0.0.0.0:8877 (203)
[2020-05-07 09:36:25 +0000] [203] [INFO] Using worker: threads
[2020-05-07 09:36:25 +0000] [219] [INFO] Booting worker with pid: 219
2020-05-07 09:36:25 diagd 1.4.2 [P219TAEW] INFO: starting Scout checker
2020-05-07 09:36:25 diagd 1.4.2 [P219TAEW] INFO: starting event watcher
2020-05-07 09:36:26 AMBASSADOR INFO diagd running
2020-05-07 09:36:26 AMBASSADOR INFO launching worker process 'watt': 'watt' '--port' '8002' '--notify' 'python /ambassador/post_update.py --watt ' '-s' 'service' '-s' 'ingresses' '-s' 'AuthService' '-s' 'Mappin
g' '-s' 'Module' '-s' 'RateLimitService' '-s' 'TCPMapping' '-s' 'TLSContext' '-s' 'TracingService' '-s' 'ConsulResolver' '-s' 'KubernetesEndpointResolver' '-s' 'KubernetesServiceResolver' '-s' 'Host' '-s' 'LogS
ervice' '--watch' 'python /ambassador/watch_hook.py'
2020-05-07 09:36:26 AMBASSADOR INFO watt is PID 228
2020-05-07 09:36:26 AMBASSADOR INFO launching worker process 'amb-sidecar': '/ambassador/sidecars/amb-sidecar'
2020-05-07 09:36:26 AMBASSADOR INFO amb-sidecar is PID 231
time="2020-05-07 09:36:26" level=info msg="Ambassador Edge Stack configuation loaded" func=github.com/datawire/apro/cmd/amb-sidecar/runner.runE file="github.com/datawire/apro@/cmd/amb-sidecar/runner/main.go:138"
2020/05/07 09:36:27 starting watt...
2020/05/07 09:36:27 kubebootstrap: starting
2020/05/07 09:36:27 consulwatchman: starting
2020/05/07 09:36:27 kubewatchman: starting
2020/05/07 09:36:27 aggregator: starting
2020/05/07 09:36:27 invoker: starting
2020/05/07 09:36:27 api: starting
2020/05/07 09:36:27 kubebootstrap: adding kubernetes watch for "service" in namespace ""
2020/05/07 09:36:27 api: snapshot server listening on: :8002
2020/05/07 09:36:27 kubebootstrap: adding kubernetes watch for "ingresses" in namespace ""
I0507 09:36:27.203538 231 merged_client_builder.go:122] Using in-cluster configuration
I0507 09:36:27.203717 231 merged_client_builder.go:164] Using in-cluster namespace
2020/05/07 09:36:27 kubebootstrap: adding kubernetes watch for "AuthService" in namespace ""
time="2020-05-07 09:36:27" level=info msg="license_secret_watch: installing license secrets watcher..." func=github.com/datawire/apro/cmd/amb-sidecar/runner.runE file="github.com/datawire/apro@/cmd/amb-sidecar/
runner/main.go:242"
time="2020-05-07 09:36:27" level=info msg="Loading content from git repo" func=github.com/datawire/apro/cmd/amb-sidecar/devportal/content.NewContent file="github.com/datawire/apro@/cmd/amb-sidecar/devportal/con
tent/content.go:83" contentBranch=master contentSubdir=/ contentURL="https://github.com/datawire/devportal-content.git" subsystem=content
time="2020-05-07 09:36:27" level=info msg="license_secret_watch: starting the AES secret core/ambassador-edge-stack watcher" func=github.com/datawire/apro/cmd/amb-sidecar/runner.runE.func4 file="github.com/data
wire/apro@/cmd/amb-sidecar/runner/main.go:247" MAIN=license_secret_watch
time="2020-05-07 09:36:27" level=info msg="license_secret_watch: watching license file \"/home/ambassador/.config/ambassador/license-key\"" func=github.com/datawire/apro/cmd/amb-sidecar/runner.runE.func3 file="
github.com/datawire/apro@/cmd/amb-sidecar/runner/main.go:231" MAIN=license_refresh
time="2020-05-07 09:36:27" level=info msg="Creating watch on /home/ambassador/.config/ambassador/" func=github.com/datawire/apro/cmd/amb-sidecar/runner.triggerOnChange file="github.com/datawire/apro@/cmd/amb-si
decar/runner/files.go:51"
time="2020-05-07 09:36:27" level=error msg="Failed to create watch on /home/ambassador/.config/ambassador/: Changes might require a restart: no such file or directory" func=github.com/datawire/apro/cmd/amb-side
car/runner.triggerOnChange file="github.com/datawire/apro@/cmd/amb-sidecar/runner/files.go:54"
2020/05/07 09:36:27 kubebootstrap: adding kubernetes watch for "Mapping" in namespace ""
time="2020-05-07 09:36:27" level=info msg="initial count 0" func="github.com/datawire/apro/cmd/amb-sidecar/ratelimits.(RateLimitController).DoWatch" file="github.com/datawire/apro@/cmd/amb-sidecar/ratelimits/r
ls_watch.go:81" MAIN=ratelimit_controller
2020/05/07 09:36:27 kubebootstrap: adding kubernetes watch for "Module" in namespace ""
2020/05/07 09:36:27 kubebootstrap: adding kubernetes watch for "RateLimitService" in namespace ""
time="2020-05-07 09:36:27" level=info msg="remove /tmp/amb/config: no such file or directory" func="github.com/datawire/apro/cmd/amb-sidecar/ratelimits.(RateLimitController).DoWatch.func1" file="github.com/dat
awire/apro@/cmd/amb-sidecar/ratelimits/rls_watch.go:134" MAIN=ratelimit_controller
I0507 09:36:27.545298 231 reflector.go:120] Starting reflector
Try to update the AKS version to 1.16.7, but the problem still exists.
ambassador-67d64b4b6b-ptfrh 0/1 CrashLoopBackOff 7 11m
...
2020-05-08 11:13:00 diagd 1.4.2 [P219TThreadPoolExecutor-0_1] INFO: C6416A76-81B4-4163-8304-D1ADB05522BF: 127.0.0.1 "GET /ambassador/v0/diag/" START 2020-05-08 11:13:00 diagd 1.4.2 [P219TThreadPoolExecutor-0_1] ERROR: 'NoneType' object has no attribute 'overview' Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 233, in wrapper result = f(*args, reqid=reqid, **kwds) File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 519, in show_overview ov = diag.overview(request, app.estats) AttributeError: 'NoneType' object has no attribute 'overview' 2020-05-08 11:13:00 diagd 1.4.2 [P219TThreadPoolExecutor-0_1] ERROR: C6416A76-81B4-4163-8304-D1ADB05522BF: 127.0.0.1 "GET /ambassador/v0/diag/" 2ms 500 server error
After 12 restart and almost hafl an hour it is in Running state and working.
ambassador-67d64b4b6b-ptfrh 1/1 Running 12 28m
ACCESS [2020-05-08T11:18:53.326Z] "GET /a/b/c/d/e HTTP/1.1" 200 - 0 4161 8 5 "f.g.h.i" "-" "05f09cff-31ac-466e-ba20-baea4833d1a8" "j.k.l.m" "n.o.p.q:80"
Issue is exist in the version 1.5.0 too.
Issue is exist in the version 1.5.2 too.
I'm also seeing this issue in 1.5.2
@attilajanko do you have many Mapping
objects? It seems if I clear that to a small number Ambassador will run successfully.
Hi @grrywlsn for last chance I deployed an new cluster in the Azure simply from the Marketplace nothing terraform, nothing arm template, just click-click create with version 1.15.11. After I following the manual steps
kubectl apply -f https://www.getambassador.io/yaml/aes-crds.yaml && \ kubectl wait --for condition=established --timeout=90s crd -lproduct=aes && \ kubectl apply -f https://www.getambassador.io/yaml/aes.yaml && \ kubectl -n ambassador wait --for condition=available --timeout=90s deploy -lproduct=aes
The log still contains the original error, but start without a lot of restarts
time="2020-06-12 08:10:57" level=warning msg="statsd is not in use" 2020-06-12 08:10:57 diagd 1.3.2 [P111TThreadPoolExecutor-0_0] INFO: 28A91046-46A8-4238-BEE8-8D0D9A05FA83: 127.0.0.1 "GET /ambassador/v0/diag/" START 2020-06-12 08:10:57 diagd 1.3.2 [P111TThreadPoolExecutor-0_0] ERROR: 'NoneType' object has no attribute 'overview' Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 233, in wrapper result = f(*args, reqid=reqid, **kwds) File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 519, in show_overview ov = diag.overview(request, app.estats) AttributeError: 'NoneType' object has no attribute 'overview' 2020-06-12 08:10:57 diagd 1.3.2 [P111TThreadPoolExecutor-0_0] ERROR: 28A91046-46A8-4238-BEE8-8D0D9A05FA83: 127.0.0.1 "GET /ambassador/v0/diag/" 2ms 500 server error time="2020-06-12 08:10:57" level=error msg="Bad HTTP response" status_code=500 subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-06-12 08:10:57" level=error msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true" subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-06-12 08:10:57" level=info msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true"
2020-06-12 08:11:57 diagd 1.3.2 [P111TThreadPoolExecutor-0_3] INFO: 91583D1E-CBD1-4BB6-883F-0AF2D94E16F8: 127.0.0.1 "GET /ambassador/v0/diag/" START 2020-06-12 08:11:57 diagd 1.3.2 [P111TThreadPoolExecutor-0_3] INFO: DEBUG_MODE False status_dict {'Error check': {'status': True, 'specifics': [(True, 'No errors logged')]}, 'TLS': {'status': True, 'specifics': [(True, '1 TLSContext is active')]}, 'Mappings': {'status': True, 'specifics': [(True, '5 Mappings are active')]}}
Regards Attila
And if you think this type of mapping at the moment only these exists in the new cluster
kubectl get mappings --all-namespaces NAMESPACE NAME PREFIX SERVICE STATE REASON ambassador ambassador-devportal /docs/ 127.0.0.1:8500 Running ambassador ambassador-devportal-api /openapi/ 127.0.0.1:8500 Running
But of course we have a lot of mappings in our services (in our existing configuration)
apiVersion: ambassador/v2 kind: Mapping name: abcd_mapping prefix: /api/v1/abcd rewrite: /api/v1/abcd service: : efgh timeout_ms: 60000
Following this: https://github.com/datawire/ambassador/issues/1784
I had the same issue during the start.
curl 127.0.0.1:8877/ambassador/v0/check_alive ambassador liveness check OK (6 seconds) curl 127.0.0.1:8877/ambassador/v0/check_ready ambassador waiting for config
Seems this configuration for liveness and readiness probe was helpful for us
initialDelaySeconds: 90 periodSeconds: 60 timeoutSeconds: 15 failureThreshold: 10 successThreshold: 1
After these settings the 500 error still exists, but no restart and the pod starting around ~5min
ambassador-7788d44cd7-7ndt2 1/1 Running 0 5m16s ambassador-86dbc79c74-64bmk 1/1 Running 0 4m2s
Describe the bug After the AES deployment on an Azure AKS cluster aks... Ready agent v1.13.12 4.15.0-1082-azure docker://3.0.10+azure
during the ambassador-... pods starting there is a python error in the logs and get 500 internal server error from the /diag/.
2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] INFO: 7C716B56-786E-426C-B03C-96554F086837: 127.0.0.1 "GET /ambassador/v0/diag/" START 2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] ERROR: 'NoneType' object has no attribute 'overview' Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 233, in wrapper result = f(*args, reqid=reqid, **kwds) File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 519, in show_overview ov = diag.overview(request, app.estats) AttributeError: 'NoneType' object has no attribute 'overview' 2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] ERROR: 7C716B56-786E-426C-B03C-96554F086837: 127.0.0.1 "GET /ambassador/v0/diag/" 1ms 500 server error time="2020-05-07 09:36:28" level=error msg="Bad HTTP response" func=github.com/datawire/apro/cmd/amb-sidecar/devportal/server.HTTPGet.func1 file="github.com/datawire/apro@/cmd/amb-sidecar/devportal/server/fetch er.go:165" status_code=500 subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-05-07 09:36:28" level=error msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true" func=github.com/datawire/apro/cmd/amb-sidecar/devportal/server.HTTPGet file="github.com/dataw ire/apro@/cmd/amb-sidecar/devportal/server/fetcher.go:172" subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-05-07 09:36:28" level=info msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true" func="github.com/datawire/apro/cmd/amb-sidecar/devportal/server.(*fetcher)._retrieve" file="gi thub.com/datawire/apro@/cmd/amb-sidecar/devportal/server/fetcher.go:195"
To Reproduce Steps to reproduce the behavior:
Expected behavior A running, stable deployment, and pods which starting ~1 min.
Versions (please complete the following information):
Additional context The problem is persist, not helpful the "delete the pod", persist if I scale up and down the pods. Every new pod restart needs 7-8-10 restart before reach the running state.