SUSE / stratos

SUSE Stratos: Web-based Console UI for Cloud Foundry and Kubernetes
Apache License 2.0
15 stars 12 forks source link

Upgrades from 4.0.0 to 4.0.1 RC1- fails #454

Closed prabalsharma closed 4 years ago

prabalsharma commented 4 years ago

stratos-0 pod never becomes ready after upgrade from stratos 4.0.0 on CAP 2.0.1-rc1.

Steps to reproduce:

  1. helm repo add stratos http://opensource.suse.com/stratos/
  2. kubectl create ns stratos
  3. helm install stratos stratos/console --namespace stratos --set "console.service.type=LoadBalancer"
  4. Create DNS record
    
    NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
    stratos stratos     1           2020-08-21 17:48:18.87153408 +0000 UTC  deployed    console-4.0.0   2.0.0    

stratos stratos-0 2/2 Running 0 8m38s stratos stratos-chartstore-847c94d857-q5sgk 2/2 Running 0 8m38s stratos stratos-chartsync-7c64b49999-xx2pd 1/1 Running 0 8m38s stratos stratos-config-init-1-kp4xh 0/1 Completed 0 8m38s stratos stratos-db-75469847b9-drw9k 1/1 Running 0 8m38s

stratos-ui-ext LoadBalancer 10.39.251.182 35.204.149.6 443:31398/TCP 9m42s


5. helm upgrade stratos stratos/console --devel --namespace stratos --set "console.service.type=LoadBalancer"

NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION stratos stratos 2 2020-08-21 18:15:23.319335256 +0000 UTC deployed console-4.0.1-rc.1 2.0.0

stratos-ui-ext LoadBalancer 10.39.251.182 35.204.149.6 443:30878/TCP 37m 081c0585-80b1-4b71-6837-f5161e1d4da0:/tmp/build/33f79dee # kubectl get pods -n stratos NAME READY STATUS RESTARTS AGE stratos-0 1/2 Running 0 37m stratos-chartstore-8478c679f-mpqnh 2/2 Running 0 10m stratos-chartsync-7b586cb784-tbt8t 1/1 Running 0 10m stratos-config-init-2-xdthn 0/1 Completed 0 10m stratos-db-78c95864d-77rcd 1/1 Running 0 10m

**stratos-0                            1/2     Running  **

`kubectl logs -n stratos stratos-0 proxy` :

INFO[Fri Aug 21 18:13:24 UTC 2020] CloudFoundry Register...
Request: [2020-08-21T18:13:24Z] Remote-IP:"10.164.0.72" Method:"POST" Path:"/pp/v1/register/cf" Status:201 Latency:68.607678ms Bytes-In:711 Bytes-Out:574 Request: [2020-08-21T18:13:24Z] Remote-IP:"10.164.0.72" Method:"GET" Path:"/pp/v1/info" Status:200 Latency:3.695567ms Bytes-In:0 Bytes-Out:3724 INFO[Fri Aug 21 18:13:36 UTC 2020] CloudFoundry Connect...
Request: [2020-08-21T18:13:36Z] Remote-IP:"10.164.0.95" Method:"POST" Path:"/pp/v1/auth/login/cnsi" Status:200 Latency:803.145077ms Bytes-In:247 Bytes-Out:607 Request: [2020-08-21T18:13:37Z] Remote-IP:"10.164.0.95" Method:"GET" Path:"/pp/v1/info" Status:200 Latency:4.801692ms Bytes-In:0 Bytes-Out:4051 Request: [2020-08-21T18:13:37Z] Remote-IP:"10.164.0.72" Method:"GET" Path:"/pp/v1/proxy/v2/info" Status:200 Latency:170.264073ms Bytes-In:0 Bytes-Out:745 Request: [2020-08-21T18:13:42Z] Remote-IP:"10.164.0.72" Method:"GET" Path:"/pp/v1/proxy/v2/organizations" Status:200 Latency:45.950589ms Bytes-In:0 Bytes-Out:3530 WARN[Fri Aug 21 18:13:42 UTC 2020] Passthrough response: URL: https://autoscaler.prabal-gke.ci.kubecf.charmedquarks.me/v1/info, Status Code: 404, Status: 404 Not Found, Content Type: text/plain; charset=utf-8, Length: 100 WARN[Fri Aug 21 18:13:42 UTC 2020] 404 Not Found: Requested route ('autoscaler.prabal-gke.ci.kubecf.charmedquarks.me') does not exist. Request: [2020-08-21T18:13:42Z] Remote-IP:"10.164.0.95" Method:"GET" Path:"/pp/v1/autoscaler/info" Status:404 Latency:113.063777ms Bytes-In:0 Bytes-Out:100 Request: [2020-08-21T18:13:43Z] Remote-IP:"10.164.0.72" Method:"GET" Path:"/pp/v1/proxy/v2/apps" Status:200 Latency:236.040624ms Bytes-In:0 Bytes-Out:115 Request: [2020-08-21T18:13:43Z] Remote-IP:"10.164.0.72" Method:"GET" Path:"/pp/v1/proxy/v2/users" Status:200 Latency:387.35667ms Bytes-In:0 Bytes-Out:868 [mysql] 2020/08/21 18:15:27 packets.go:122: closing bad idle connection: unexpected read from socket [mysql] 2020/08/21 18:15:27 packets.go:122: closing bad idle connection: unexpected read from socket ERRO[Fri Aug 21 18:15:27 UTC 2020] Error trying to get current database version {"time":"2020-08-21T18:15:27.321519319Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1071","message":"code=503, message=Error trying to get current database version"} ERRO[Fri Aug 21 18:15:38 UTC 2020] Error trying to get current database version {"time":"2020-08-21T18:15:38.3389611Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1071","message":"code=503, message=Error trying to get current database version"} ERRO[Fri Aug 21 18:15:48 UTC 2020] Error trying to get current database version {"time":"2020-08-21T18:15:48.326902632Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1071","message":"code=503, message=Error trying to get current database version"} ERRO[Fri Aug 21 18:15:58 UTC 2020] Error trying to get current database version {"time":"2020-08-21T18:15:58.338935439Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1071","message":"code=503, message=Error trying to get current database version"} ERRO[Fri Aug 21 18:16:08 UTC 2020] Error trying to get current database version {"time":"2020-08-21T18:16:08.323041607Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1071","message":"code=503, message=Error trying to get current database version"} ERRO[Fri Aug 21 18:16:18 UTC 2020] Error trying to get current database version {"time":"2020-08-21T18:16:18.338985224Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1071","message":"code=503, message=Error trying to get current database version"} INFO[Fri Aug 21 18:16:19 UTC 2020] SessionDataRepository: unable to delete expired sessions: dial tcp 10.39.252.54:3306: connect: connection refused 2020/08/21 18:16:19 mysqlstore: unable to delete expired sessions: dial tcp 10.39.252.54:3306: connect: connection refused ERRO[Fri Aug 21 18:16:27 UTC 2020] Error trying to get current database version {"time":"2020-08-21T18:16:27.324800668Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1071","message":"code=503, message=Error trying to get current database version"}

nwmac commented 4 years ago

I have not been able to reproduce this on either a local minikube or multi-node CaaSP system in engineering cloud.

Can you share the full logs for the proxy container? (the one for the upgraded deployment that is not ready)

The log shown seems to be from the proxy container of the first install - and at 18:15:27 it has problems connecting to the DB, but I'd expect that, as this is after the time when the upgrade was started - so a new proxy container should be coming up and the db pod will also be recreated - this log is just showing the first proxy container as it gets killed off.

nwmac commented 4 years ago

Was able to reproduce on your system - fixed - there is now an RC2.

Closing - reopen if you still encounter issues