aws / aws-iot-device-sdk-cpp-v2

Next generation AWS IoT Client SDK for C++ using the AWS Common Runtime
Apache License 2.0
183 stars 108 forks source link

Expose TCP socket file handler for MQTT session #232

Closed psiphi75 closed 1 week ago

psiphi75 commented 3 years ago

Is your feature request related to a problem? Please describe. We have an application that needs a continuous real-time connection with AWS IoT, any drop in the network needs to be quickly detected and a connection re-established. We are also on a cellular data plan, so every byte we send costs. The TCP Keepalive feature is significantly more data efficient than the MQTT "Keepalive", hence the MQTT method is not an option.

The TCP Keepalive feature does not work as expected in Linux, see #220 for details. In Linux (maybe applies to other platforms too) the TCP Keepalive also needs the "TCP_USERTIMEOUT" option set for the socket if you want the TCP connection killed. In our case, we a publishing data over the MQTT connection, this publishing of data causes the TCP Keepalive to not drop the connection, which may keep the socket alive for up-to 30 minutes.

Describe the solution you'd like It would be great to have the file descriptor for the underlying TCP socket to be available in the MQTT class. This general solution would allow the user of the SDK to customise the network behaviour by using setsockopt().

Describe alternatives you've considered

JonathanHenson commented 3 years ago

As a general feature this makes sense. I don’t think you need access to the fd though. Couldn’t we just expose the socket options and set them for you?

psiphi75 commented 3 years ago

Couldn’t we just expose the socket options and set them for you?

Yes, that would work too.

github-actions[bot] commented 2 years ago

Greetings! Sorry to say but this is a very old issue that is probably not getting as much attention as it deservers. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.

jmklix commented 3 months ago

We aren't planning on exposing the TCP socket file handler. But I did want to ask if there are any other suggestions you have for improving KeepAlive when using this sdk?

psiphi75 commented 3 months ago

Adding a Keep-Alive time option would be great. We have a use case where our customers connect to the cloud via a cellular modem with a roaming SIM card. The per-MB cost of this is a lot and we need a real-time connect when one is available. We have found that some cellular providers silently timeout a TCP connection after 60 seconds. This means we need to keep the TCP connection alive and an MQTT ping takes many more bytes than a TCP keep alive packet.

However, we have manually changed the code and keep a patch going for this option. But in a nutshell, I don't really need this change, we have our own work-around and it works for us.

bretambrose commented 1 week ago

We revisited keep alive recently. I tested across all supported platforms and the timeout was triggered as expected. I used settings comparable to

    struct aws_socket_options socket_options = {
        .type = AWS_SOCKET_STREAM,
        .connect_timeout_ms = (uint32_t)app_ctx.connect_timeout,
        .keep_alive_timeout_sec = 5,
        .keepalive = true,
        .keep_alive_interval_sec = 2,
        .keep_alive_max_failed_probes = 1,
    };

verified the probes in a packet capture and expected socket timeout timing after pulling the plug on the router.

Given the documentation around TCP_USER_TIMEOUT (override behavior) I wonder if you accidentally left off one of the keep alive settings (they all need to be set).

There is some divergent behavior cross-platform (on Mac, keep_alive_interval_sec is the delay between probes, on Windows/Linux, keep_alive_timeout_sec is the delay between probes) but we cannot change that behavior at this point.

github-actions[bot] commented 1 week ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.