Open jellyfish-bot opened 3 years ago
[nitish] This issue has attached support thread https://jel.ly.fish/a9fb650b-9495-4fe7-ae28-5a3383a3c76e
Another issue where the user reported similar error but not related to actual issue: https://jel.ly.fish/12187ba5-bdcb-4bdc-a124-7afa947469c5
Context: in this instance, Supervisor is communicating that it was unable to establish a connection to the balena logging server while setting up a request stream, see here. Supervisor will attempt to re-establish a connection with the balena logging server in at least a COOLDOWN_PERIOD
millisecond interval. In the above logs, this error only appeared once, so it's possible that after COOLDOWN_PERIOD ms, the logging server connection succeeded. This indicates a possible race condition.
The balena API endpoint for logging that is being called in this case is ${apiEndpoint}/device/v2/${uuid}/log-stream
, where uuid is the device uuid. Here and here are the code blocks where this request is handled in open-balena-api. You'll notice that in the storeStream handler, no 504
errors are sent. This indicates that the error might originate from the api.resin.clone call, meaning the error is passed along from @balena/pinejs. In pine-js you can find the 504 error as an sbvr-api error here.
All this means that Supervisor is not the likely origin of this 504 error, but is simply passing it along from the balena API backend. The real origin of this error is unknown to me. One thing Supervisor can do in this instance is to communicate the error more clearly is to add a response message in this code block, if it exists, assuming a response message is passed along from the backend (uncertain if this is the case.
Like so (reference):
// Since we haven't sent the request body yet, and never will,the
// only reason for the server to prematurely respond is to
// communicate an error. So teardown the connection immediately
this.req.on('response', (res) => {
log.error(
'LogBackend: server responded with status code:',
res.statusCode,
' - message:'
res.statusMessage
);
this.teardown();
});
For "Another issue where the user reported similar error but not related to actual issue" as reported by @nitishagar above, it's unclear whether the LogBackend error persisted after the user fixed the main issue. The JF ticket (https://jel.ly.fish/12187ba5-bdcb-4bdc-a124-7afa947469c5) is also from 2019 so it's unlikely the user will have context or remember, if I were to ping them in JF. Let's move forward disregarding this second ticket, as there's not enough context in that ticket to be sure that it's the same pattern that this error is showing.
@cywang117 the problem is the Supervisor tried to communicate with an HTTP service and received a 504. The status code itself is enough to know what's happening so including the status message won't be helpful.
To resolve this issue we should reproduce the error but have the Supervisor better report something like...
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 06:31:09 7c21238 resin-supervisor[1497]: [info] Internet Connectivity: OK
Nov 27 06:31:50 7c21238 resin-supervisor[1497]: [error] Logging backend returned a {statusCode} response!
Nov 27 06:31:50 7c21238 resin-supervisor[1497]: [error] Unable to establish connection for streaming logs...retrying in {interval} seconds.
Nov 27 06:31:50 7c21238 resin-supervisor[1497]: [info] Streaming device logs to logging backend.
^^^ new logging behaviour
Nov 27 06:35:59 7c21238 resin-supervisor[1497]: [api] GET /v1/healthy 200 - 3.511 ms
Nov 27 06:36:47 7c21238 resin-supervisor[1497]: [event] Event: Update notification {}
Nov 27 06:36:47 7c21238 resin-supervisor[1497]: [api] POST /v1/update 204 - 5.163 ms
Substitute statusCode if response is not 2xx and interval with time until next attempt
[nitish] Users are seeing the following error: [error] LogBackend: server responded with status code: 504
Full log from another thread: