FlowFuse / nr-project-nodes

A set of Node-RED nodes for inter-project communication within the FlowFuse platform
Apache License 2.0
5 stars 0 forks source link

Project Nodes do not keep MQTT connection open #14

Closed sammachin closed 1 year ago

sammachin commented 2 years ago

Current Behavior

Running a flow on a device with the Project In node I see a lot of these messages in the device-agent log:

[NR] 10 Aug 17:06:30 - [info] Project Link nodes connection closed
[NR] 10 Aug 17:06:36 - [info] Project Link nodes connected
[NR] 10 Aug 17:07:36 - [info] Project Link nodes connection closed
[NR] 10 Aug 17:07:41 - [info] Project Link nodes connected
[NR] 10 Aug 17:09:41 - [info] Project Link nodes connection closed
[NR] 10 Aug 17:09:46 - [info] Project Link nodes connected
[NR] 10 Aug 17:10:46 - [info] Project Link nodes connection closed
[NR] 10 Aug 17:10:51 - [info] Project Link nodes connected
[NR] 10 Aug 17:12:52 - [info] Project Link nodes connection closed
[NR] 10 Aug 17:12:57 - [info] Project Link nodes connected
[NR] 10 Aug 17:13:57 - [info] Project Link nodes connection closed
[NR] 10 Aug 17:14:03 - [info] Project Link nodes connected
[NR] 10 Aug 17:15:03 - [info] Project Link nodes connection closed

This looks like the connection is being closed every 60, 120 or 180 seconds and then taking a few seconds to reconnect.

My project doesn't send much data over the link node so it can go for hours without sending a message.

If I implement a ping message to be sent over the node every 20s then the connection messages arn't shown.

Expected Behavior

Connection should be maintained without events in the log.

Need to clarify the timeout configuration between nodes and the broker in order to have the optimal experience but reduce data overheads.

Steps To Reproduce

deploy project to device with link nodes, do not send any data over the nodes for ~5mins, observe log

Environment

hardillb commented 2 years ago

I've been looking at the logs to see what's going on and they are not as helpful as I'd like.

It looks like mosquitto is logging the IP address of the ALB proxy and not looking for the X-Forwarded-For header to get the real remote client IP address.

But I'm going to guess that this is down to the ALB closing connections that don't send data.

The default MQTTjs keepalive is 60 seconds and I can't see anywhere we override that in the node at the moment

hardillb commented 2 years ago

I have found an idle timeout settings on the ALB that defaults to 60 seconds. I have bumped this to 120 and things look more stable.

hardillb commented 2 years ago

I can't find a way to set this automatically from the k8s ingress settings.

hardillb commented 2 years ago

Can we move this to review now?

Steve-Mcl commented 1 year ago

I have found an idle timeout settings on the ALB that defaults to 60 seconds. I have bumped this to 120 and things look more stable.

Looking at my logs and the recent discussion in slack, I suspect this is still 60s

I will shortly raise a PR that ensures we set the keepalive flag in the mqtt.connect options to sub 60 secs as my testing reveals this makes the connection stable.

Proposal confirmed by Ben as "probably right for now" - thanks Ben. PR incoming.