cloudfoundry / stratos

Stratos: Web-based Management UI for Cloud Foundry and Kubernetes
Apache License 2.0
251 stars 132 forks source link

App log streaming breaks with modern cf-deployment #5037

Open ionphractal opened 1 year ago

ionphractal commented 1 year ago

Stratos Version

4.4.0, but also master branch is affected

Frontend Deployment type

Backend (Jet Stream) Deployment type

Expected behaviour

When clicking on an app "log stream", stratos should show recent app logs (at best) and tail the logs.

Actual behaviour

Page remains empty, error in Stratos log "Failed to get recent messages for App ... on CNSI ... [unknown issue when making HTTP request to Loggregator]"

Steps to reproduce the behavior

Deploy a modern CF with cf-deployment >= v24.3.0, deploy Stratos and try to get an app log stream.

Log output covering before error and any error statements

2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT DEBU[Thu Jan 12 12:02:12 UTC 2023] Decrypting Refresh Token
2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT DEBU[Thu Jan 12 12:02:12 UTC 2023] decryptToken
2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT DEBU[Thu Jan 12 12:02:12 UTC 2023] Decrypt
2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT DEBU[Thu Jan 12 12:02:12 UTC 2023] Creating Noaa consumer for Doppler endpoint wss://doppler.REDACTED:443
2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT DEBU[Thu Jan 12 12:02:12 UTC 2023] Upgrading request to the WebSocket protocol...
2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT DEBU[Thu Jan 12 12:02:12 UTC 2023] Successfully upgraded to a WebSocket connection
2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT INFO[Thu Jan 12 12:02:12 UTC 2023] Received request for log stream for App ID: REDACTED - in CNSI: REDACTED
2023-01-12T13:02:12.04+0100 [APP/PROC/WEB/1] OUT DEBU[Thu Jan 12 12:02:12 UTC 2023] getRecentLogs
2023-01-12T13:02:12.07+0100 [APP/PROC/WEB/1] OUT echo: http: response.WriteHeader on hijacked connection from github.com/labstack/echo/v4.(*Response).WriteHeader (response.go:63)
2023-01-12T13:02:12.07+0100 [APP/PROC/WEB/1] OUT echo: http: response.Write on hijacked connection from github.com/labstack/echo/v4.(*Response).Write (response.go:75)
2023-01-12T13:02:12.07+0100 [APP/PROC/WEB/1] OUT {"time":"2023-01-12T12:02:12.07051961Z","level":"ERROR","prefix":"echo","file":"main.go","line":"1235","message":"Failed to get recent messages for App REDACTED on CNSI REDACTED [unknown issue when making HTTP request to Loggregator]"}
2023-01-12T13:02:12.07+0100 [APP/PROC/WEB/1] OUT Request: [2023-01-12T12:02:12Z] Remote-IP:"REDACTED" Method:"GET" Path:"/pp/v1/REDACTED/apps/REDACTED/stream" Status:500 Latency:24.756129ms Bytes-In:0 Bytes-Out:0

Detailed Description

CF loggregator-release recently removed the RecentLogsHandler from Traffic Controller (https://github.com/cloudfoundry/loggregator-release/releases/tag/v107.0.0) This has become part of CF in cf-deployment >= v24.3.0. Noaa was also recently update https://github.com/cloudfoundry/noaa/commit/f0749146decfd357a5583c29bd3ad2b15b322c85 to reflect the change.

Context

Possible Implementation

I'm not sure if this can be fixed by using firehose v2 api, but at least using log-cache would resolve this imho (would also fix https://github.com/cloudfoundry/stratos/issues/4832).

I could guess it'll be a larger change, so in the meantime, it could also help to minimalize the loss if Stratos would continue to tailing the logs in case there is an (unknown) error retrieving the recent logs. https://github.com/cloudfoundry/stratos/blob/master/src/jetstream/plugins/cloudfoundry/cf_websocket_streams.go#L188-L208