balena-os / balena-supervisor

Balena Supervisor: balena's agent on devices.
https://balena.io
Other
147 stars 63 forks source link

[error] LogBackend: server responded with status code: 504 #1685

Open jellyfish-bot opened 3 years ago

jellyfish-bot commented 3 years ago

[nitish] Users are seeing the following error: [error] LogBackend: server responded with status code: 504

Full log from another thread:


-- Logs begin at Wed 2019-11-27 06:30:30 UTC, end at Wed 2019-11-27 08:56:02 UTC. --
Nov 27 06:30:58 7c21238 resin-supervisor[1452]: resin_supervisor
Nov 27 06:30:58 7c21238 resin-supervisor[1496]: active
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: Container config has not changed
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: Starting system message bus: dbus.
Nov 27 06:30:59 7c21238 resin-supervisor[1497]:  * Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon
Nov 27 06:30:59 7c21238 resin-supervisor[1497]:    ...done.
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [info]    Supervisor v10.3.7 starting up...
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: (node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Starting event tracker
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Starting api binder
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Starting logging infrastructure
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [event]   Event: Supervisor start {}
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Performing database cleanup for container log timestamps
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [info]    Previous engine snapshot was not stored. Skipping cleanup.
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Handling of local mode switch is completed
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Connectivity check enabled: true
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Starting periodic check for IP addresses
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [info]    Reporting initial state, supervisor version and API info
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   VPN status path exists.
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [info]    Waiting for connectivity...
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Skipping preloading
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [info]    Starting API server
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [info]    Applying target state
Nov 27 06:30:59 7c21238 resin-supervisor[1497]: [debug]   Ensuring device is provisioned
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [debug]   Starting current state report
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [debug]   Starting target state poll
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [debug]   Supervisor API listening on allowed interfaces only
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [debug]   Finished applying target state
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [info]    Applying target state
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [debug]   Finished applying target state
Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 06:31:09 7c21238 resin-supervisor[1497]: [info]    Internet Connectivity: OK
Nov 27 06:31:50 7c21238 resin-supervisor[1497]: [error]   LogBackend: server responded with status code: 504
Nov 27 06:35:59 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 3.511 ms
Nov 27 06:36:47 7c21238 resin-supervisor[1497]: [event]   Event: Update notification {}
Nov 27 06:36:47 7c21238 resin-supervisor[1497]: [api]     POST /v1/update 204 - 5.163 ms
Nov 27 06:36:48 7c21238 resin-supervisor[1497]: [info]    Applying target state
Nov 27 06:36:48 7c21238 resin-supervisor[1497]: [debug]   Finished applying target state
Nov 27 06:36:48 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 06:40:59 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.682 ms
Nov 27 06:40:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 06:40:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 06:45:59 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.875 ms
Nov 27 06:50:59 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.832 ms
Nov 27 06:50:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 06:50:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 06:55:59 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.831 ms
Nov 27 07:00:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 07:00:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 07:00:59 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.873 ms
Nov 27 07:06:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.159 ms
Nov 27 07:10:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 07:10:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 07:11:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.860 ms
Nov 27 07:16:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.030 ms
Nov 27 07:20:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 07:20:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 07:21:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.116 ms
Nov 27 07:22:08 7c21238 resin-supervisor[1497]: [event]   Event: Update notification {}
Nov 27 07:22:08 7c21238 resin-supervisor[1497]: [api]     POST /v1/update 204 - 6.214 ms
Nov 27 07:22:08 7c21238 resin-supervisor[1497]: [info]    Applying target state
Nov 27 07:22:08 7c21238 resin-supervisor[1497]: [debug]   Finished applying target state
Nov 27 07:22:08 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 07:22:38 7c21238 resin-supervisor[1497]: [event]   Event: Update notification {}
Nov 27 07:22:38 7c21238 resin-supervisor[1497]: [api]     POST /v1/update 204 - 5.381 ms
Nov 27 07:22:38 7c21238 resin-supervisor[1497]: [info]    Applying target state
Nov 27 07:22:38 7c21238 resin-supervisor[1497]: [debug]   Finished applying target state
Nov 27 07:22:38 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 07:26:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.796 ms
Nov 27 07:30:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 07:30:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 07:31:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.755 ms
Nov 27 07:36:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.781 ms
Nov 27 07:40:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 07:40:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 07:41:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.856 ms
Nov 27 07:46:00 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.764 ms
Nov 27 07:50:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 07:50:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 07:51:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.769 ms
Nov 27 07:56:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.397 ms
Nov 27 08:00:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 08:00:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 08:01:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.067 ms
Nov 27 08:06:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.783 ms
Nov 27 08:10:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 08:10:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 08:11:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.417 ms
Nov 27 08:16:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.180 ms
Nov 27 08:20:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 08:20:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 08:21:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.722 ms
Nov 27 08:26:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.740 ms
Nov 27 08:30:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 08:30:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 08:31:01 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.739 ms
Nov 27 08:36:02 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.787 ms
Nov 27 08:40:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 08:40:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 08:41:02 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.934 ms
Nov 27 08:44:17 7c21238 resin-supervisor[1497]: [event]   Event: Update notification {}
Nov 27 08:44:17 7c21238 resin-supervisor[1497]: [api]     POST /v1/update 204 - 5.746 ms
Nov 27 08:44:18 7c21238 resin-supervisor[1497]: [info]    Applying target state
Nov 27 08:44:18 7c21238 resin-supervisor[1497]: [debug]   Finished applying target state
Nov 27 08:44:18 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 08:46:02 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.743 ms
Nov 27 08:50:59 7c21238 resin-supervisor[1497]: [debug]   Attempting container log timestamp flush...
Nov 27 08:50:59 7c21238 resin-supervisor[1497]: [debug]   Container log timestamp flush complete
Nov 27 08:51:02 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 1.119 ms
Nov 27 08:56:02 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 0.728 ms
root@7c21238:~# 
jellyfish-bot commented 3 years ago

[nitish] This issue has attached support thread https://jel.ly.fish/a9fb650b-9495-4fe7-ae28-5a3383a3c76e

nitishagar commented 3 years ago

Another issue where the user reported similar error but not related to actual issue: https://jel.ly.fish/12187ba5-bdcb-4bdc-a124-7afa947469c5

cywang117 commented 3 years ago

Context: in this instance, Supervisor is communicating that it was unable to establish a connection to the balena logging server while setting up a request stream, see here. Supervisor will attempt to re-establish a connection with the balena logging server in at least a COOLDOWN_PERIOD millisecond interval. In the above logs, this error only appeared once, so it's possible that after COOLDOWN_PERIOD ms, the logging server connection succeeded. This indicates a possible race condition.

The balena API endpoint for logging that is being called in this case is ${apiEndpoint}/device/v2/${uuid}/log-stream, where uuid is the device uuid. Here and here are the code blocks where this request is handled in open-balena-api. You'll notice that in the storeStream handler, no 504 errors are sent. This indicates that the error might originate from the api.resin.clone call, meaning the error is passed along from @balena/pinejs. In pine-js you can find the 504 error as an sbvr-api error here.

All this means that Supervisor is not the likely origin of this 504 error, but is simply passing it along from the balena API backend. The real origin of this error is unknown to me. One thing Supervisor can do in this instance is to communicate the error more clearly is to add a response message in this code block, if it exists, assuming a response message is passed along from the backend (uncertain if this is the case.

Like so (reference):

// Since we haven't sent the request body yet, and never will,the
// only reason for the server to prematurely respond is to
// communicate an error. So teardown the connection immediately
this.req.on('response', (res) => {
    log.error(
        'LogBackend: server responded with status code:',
        res.statusCode,
        ' - message:'
        res.statusMessage
    );
    this.teardown();
});
cywang117 commented 3 years ago

For "Another issue where the user reported similar error but not related to actual issue" as reported by @nitishagar above, it's unclear whether the LogBackend error persisted after the user fixed the main issue. The JF ticket (https://jel.ly.fish/12187ba5-bdcb-4bdc-a124-7afa947469c5) is also from 2019 so it's unlikely the user will have context or remember, if I were to ping them in JF. Let's move forward disregarding this second ticket, as there's not enough context in that ticket to be sure that it's the same pattern that this error is showing.

20k-ultra commented 3 years ago

@cywang117 the problem is the Supervisor tried to communicate with an HTTP service and received a 504. The status code itself is enough to know what's happening so including the status message won't be helpful.

To resolve this issue we should reproduce the error but have the Supervisor better report something like...

Nov 27 06:31:00 7c21238 resin-supervisor[1497]: [success] Device state apply success
Nov 27 06:31:09 7c21238 resin-supervisor[1497]: [info]    Internet Connectivity: OK

Nov 27 06:31:50 7c21238 resin-supervisor[1497]: [error]   Logging backend returned a {statusCode} response!
Nov 27 06:31:50 7c21238 resin-supervisor[1497]: [error]   Unable to establish connection for streaming logs...retrying in {interval} seconds.
Nov 27 06:31:50 7c21238 resin-supervisor[1497]: [info]   Streaming device logs to logging backend.
^^^ new logging behaviour

Nov 27 06:35:59 7c21238 resin-supervisor[1497]: [api]     GET /v1/healthy 200 - 3.511 ms
Nov 27 06:36:47 7c21238 resin-supervisor[1497]: [event]   Event: Update notification {}
Nov 27 06:36:47 7c21238 resin-supervisor[1497]: [api]     POST /v1/update 204 - 5.163 ms

Substitute statusCode if response is not 2xx and interval with time until next attempt