aws-samples / aws-iot-securetunneling-localproxy

AWS Iot Secure Tunneling local proxy reference C++ implementation
https://docs.aws.amazon.com/iot/latest/developerguide/what-is-secure-tunneling.html
Apache License 2.0
73 stars 70 forks source link

Program crash - boost assertion error #90

Closed mwaltec closed 1 year ago

mwaltec commented 2 years ago

Describe the bug

localproxy crashes with boost assertion error after continuous use of the tunnel for an indeterminate amount of time. This occurs with some consistency with anywhere from 0 to 15 minutes of use when running on an arm7(hf) host with a cellular network connection. We cannot reproduce the issue using a ARM virtual machine hosted within AWS when using a 32-bit ARM docker container.

To Reproduce

Steps to reproduce the behavior:

  1. Compile and run localproxy on an ARM7(hf) host with a less-than-ideal network connection (like cellular).
  2. Start sending data over the tunnel.
  3. Periodically check the condition of the tunnel.
  4. Observe the assertion error.

Expected behavior

I expect that localproxy is stable on a 32-bit cellular-connected ARM host.

Actual behavior

localproxy crashes with a boost assertion error usually within a range of 0 to 15 minutes of use.

Logs

data/boost/1.69.0/_/_/package/3f0386992eb6e9227fb274d10d2020b80d7e5571/include/boost/beast/websocket/detail/stream_base.hpp:105: bool boost::beast::websocket::detail::soft_mutex::try_lock(const T*) [with T = boost::beast::websocket::stream<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp> > >::write_some_op<boost::asio::const_buffer, aws::iot::securedtunneling::tcp_adapter_proxy::async_send_message_to_web_socket(aws::iot::securedtunneling::{anonymous}::tcp_adapter_context&, const std::shared_ptr<boost::beast::basic_flat_buffer<std::allocator<char> > >&, const string&)::<lambda(const boost::system::error_code&, std::size_t)> >]: Assertion `id_ != T::id' failed.

Environment (please complete the following information):

Additional context

Here is a bit of detail from boost related to this error:

https://stackoverflow.com/a/55800147

From the stackoverflow link, it appears that some kind of concurrent computing issue is being encountered. It is possible that the issue is being encountered on the ARM host with a high degree of frequency while not being encountered on the AWS virtual machine because of differences in computing speed or even network connectivity reducing the likelihood of encountering the issue on the virtual machine.

HarshGandhi-AWS commented 2 years ago

Hello @mwaltec , thank you for reaching us out with this issue. Since this issue is only reproducible on device and in certain network conditions only, please give me some time to recreate this issue locally.

Regards, Harsh Gandhi

mwaltec commented 2 years ago

Hello Harsh:

Have you had any luck duplicating the issue?

HarshGandhi-AWS commented 2 years ago

Hey @mwaltec , I was not able to. I will keep working on it. Will try few different things this week to try and reproduce the issue.

HarshGandhi-AWS commented 2 years ago

Hello @mwaltec , I had no luck reproducing the issue. Are you still seeing this issue? Is it possible that it was an intermittent issue caused because of some other reason like network issue or system crash?

mwaltec commented 2 years ago

It wasn't intermittent. We were able to get consistent results over several days. Other applications continued to function normally. Could you help me understand your setup? What are your test conditions? Were you able to make some attempts to reproduce on ARM 32-bit hardware? Were you able to connect that hardware to a mobile network?

HarshGandhi-AWS commented 2 years ago

I am running it on an 32-bit Raspberry Pi device which is connected to my wifi. I tried manually switching off and on my internet connection. If possible can you share the binary file you are using?

HarshGandhi-AWS commented 2 years ago

Hey @mwaltec , I have noticed that you are using the localproxy commit which was released in Nov, 2020. Can you please use the latest localproxy commit? I think the issue you were facing is already resolved in the latest commit.

mwaltec commented 2 years ago

Harsh:

I apologize for the delay. We're working on trying to get the issue reproduced on a raspberry pi. We've run into a bit of trouble reproducing the issue outside of using the localproxy for our application. We're trying to spin up a better way for you to reproduce the issue.

In the meantime, it would be great to know what version of raspberry pi you're using. Are you using a raspberry pi version 4?

HarshGandhi-AWS commented 2 years ago

I think you are using the older version of localproxy binary. Can you try upgrading the localproxy? Latest version: https://github.com/aws-samples/aws-iot-securetunneling-localproxy/releases/tag/v2.3.1

HarshGandhi-AWS commented 1 year ago

Closing this issue now. Please feel free to reopen this issue if you still face the same issue even after updating the local proxy binary version.