chriskohlhoff / asio

Asio C++ Library
http://think-async.com/Asio
4.91k stars 1.22k forks source link

Boost.Asio.SSL write_some cause 'decryption failed or bad record mac' on large (~1MB) transmissions #271

Open borisovandrey opened 6 years ago

borisovandrey commented 6 years ago

When we send large amount of data (1MB) using boost::asio::ssl::stream the boost closes the connection after some transmission and reports encryption error : 'decryption failed or bad record mac'.

The reason of it is a write_some function.

Shortly it uses boost::asio::write to send encrypted data to TCP socket (boost_1_57_0\boost\asio\ssl\detail\io.hpp:60, see details below), but the amount of really sent data is ignored. Instead of it, the amount of handled by OpenSSL user application buffer is returned to the user application. In case of 'would block' operation the amount of really sent data is less than reported to user application.

Details: If the socket is unblocking, the write_some has to try to send something over the socket and has to return the amount of the data it is able to send. The application has to retry next time for the rest of the output data, taking in account the amount of just sent data.

The SSL implementation uses paired BIO of OpenSSL to allow sending encrypted data not directly to a TCP socket but in a memory buffers. The Boost uses this buffers to integrate SSL in its engine implementing asynchronous socket operation.

In other words, boost internally works with TCP sockets using its mechanisms of direct or asynchronous data operation getting data from OpenSSL buffer.

For the write operation the sequence is the following:

User application calls boost::asio::ssl::stream::write_some operation and gives it the buffer with initial data. This operation calls the engine detail::io function. This function has to encrypt data using OpenSSL, read the encrypted data from the coupled OpenSSL BIO buffer and then send it out using TCP socket. First the engine io function calls the detail::write_op. This operation (in subroutines) calls the ::SSL_write. This function uses a BIO buffer to store encrypted data and returns the amount of handled initial data. The current buffer size is 17KB. In case of large outgoing message, the size will be about 16+KB. The size of the handled initial data is stored to bytes_transfered and then is used as a return value of sent amount to the user application. After this operation the SSL_get_error returns what OpenSSL wants – send the encrypted data through real socket or read raw data from the socket. In our case normally the Open SSL just want to send the portion of encrypted data. The detail::io execute the case engine::want_output:. It calls boost::asio::write(nextlayer, core.engine.get_output(core.outputbuffer), ec); This function read the encrypted data from OpenSSL buffer and transmit it using usual boost write operation. The problem is that the return of this function is not checked. The amount of the send data by this function can be less than required (16KB). But the user application is notified that that write_some function have sent 16+K. This leads to the unsynchronized state of SSL on both sides. The sender consider it sends +16KB, but receiver has gotten say only 4KB. Other data are lost. The user application makes next send skipping the 16KB. And after some send operation OpenSSL recognize the error state. It reports the error and boost closes the socket. The reason of why write operation can send not the entire encrypted portion (16KB) is the following: It tries sending the full buffer, but in case of large data stream the corresponding socket can get stacked, and the output buffer can overflow. In this case the socket operation has the following options Block in send operation in case of blocking socked Return “would block” error in case of nonblocking socket. In our case we have nonblocking socket and write operation stops execution in this case and returns the amount of really sent data. But this value is just ignored. Workaround: don't use write_some with ssl.

vinniefalco commented 6 years ago

You expect ssl::stream::write_some to work with a socket that is set to non-blocking mode? I don't think that can ever work. You need to use async_write_some, or if you want to use synchronous I/O do not set the non-blocking mode on the socket.

Workaround: don't use write_some with ssl.

write_some works perfectly fine if you don't set non-blocking mode.

pceccato commented 5 years ago

I see this error too with asio standalone 1.12.2 and openssl 1.1.1f on windows 7.

The test case is I send a 200000 byte buffer to a remote endpoint which echoes it back. I get intermittent failures with the error "decryption failed or bad record mac" in my async_read_some read handler.

The exact same test case works fine with this windows async openssl code: https://famellee.wordpress.com/category/openssl/

I was hoping to replace it with asio, but random read errors are a dealbreaker. Of course, I could be doing it wrong...

illera88 commented 5 years ago

Hi,

I'm running into a similar issue. Was any solution for this?

vinniefalco commented 5 years ago

Are you using non-blocking I/O?

rogueresistor commented 5 years ago

We are using a third-party library, Websocket++. We aren't touching the underlying socket it creates, but are seeing the exact issue whereby suddenly all communication stops with an error from OpenSSL that says decryption failed or bad record mac.

I don't believe that websocket++ is doing non-blocking I/O, other than the standard Asio async callback functionality.

So far we (@illera88 and I) have been tearing our hair out trying to figure out what is going on.

vinniefalco commented 5 years ago

Does the same problem happen when you use a beast websocket?

rogueresistor commented 5 years ago

Beast web sockets can't be used with the standalone Asio and requires pulling in more Boost dependencies.

pceccato commented 5 years ago

I manged to resolve my issue by wrapping the completion functions in a strand.

illera88 commented 5 years ago

We tried using strand but it still fails with decryption failed or bad record mac :(

vinniefalco commented 5 years ago

Are you trying to perform two simultaneous reads or two simultaneous writes?

illera88 commented 5 years ago

We are reading from a regular TCP/IP socket and writing that result from a read to the websocketpp.

Here is the entire .h that handles such reading and writing from both sockets.

Thanks for taking the time to help us!! 🥇

vinniefalco commented 5 years ago

I still think you'd be better off with Beast WebSocket, I don't understand what the big deal about using Boost is. The Boost library collection is very popular, well maintained, and has high quality code. You're handicapping yourself needlessly.

illera88 commented 5 years ago

We don't want to rewrite our code to use Beast at the moment. We would like to know if there is something wrong with our code or is asio's fault.

Can you find anything bad with our code?

vinniefalco commented 5 years ago

I would be extremely surprised to discover a bug in asio. A lot of people use it, and it is very well tested. It is 99% likely that the bug exists in your code.

vinniefalco commented 5 years ago

Where is your code?

illera88 commented 5 years ago

https://gist.github.com/illera88/e78407c4b36705f67522756b8a63f648

vinniefalco commented 5 years ago

I looked at the code, nothing jumps out at me as obviously incorrect.

illera88 commented 5 years ago

The problem arises when there is a lot of read/write traffic through the bridge. If the traffic is not high it does not fail...

At this point we will rewrite the code to use Beast. We will let you know if there is any problem with it.

Cheers

pceccato commented 4 years ago

Just revisiting this. I thought I had solved this issue by wrapping all my completion function for reading and writing in a strand. And this worked ok from all versions of asio from 1.12.2 - 1.16.1.

But I tried my test with 1.18.0 and got the dreaded:

error: (20,143,281) error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac

mabrarov commented 4 years ago

Have you tried Handler Tracking to analyze what's going on when issue happens?

pceccato commented 4 years ago

I finally managed to resolve my issue with 1.18.0. I discovered issue did not occur when I used an implicit strand (io_context running on a single thread), which made me think perhaps my completion functions were not getting executed on the strand properly.

On a hunch I tried using asio:bind_executor to bind the completion functions for async_write() and async_read_some() to the strand, instead of calling strand.wrap(), and this resolved the issue.

But my code was working fine in 1.16.1 using strand.wrap(). So I am completely at a loss to explain how this fixed things?

This worked:

asio::async_write(socket_, buf, asio::bind_executor(mStrand, 
            [this, packet](const asio::error_code& ec, size_t bytes_transferred) { handleWrite(ec, bytes_transferred); }
            ));

This didn't:

asio::async_write(socket_, buf, mStrand.wrap( 
            [this, packet](const asio::error_code& ec, size_t bytes_transferred) { handleWrite(ec, bytes_transferred); }
            ));
igsol commented 4 years ago

... my code was working fine in 1.16.1 using strand.wrap(). ... loss to explain how this fixed things?

Welcome to the common confusion described in #278 :)

mabrarov commented 4 years ago

@igsol, I may be wrong but it doesn't look like the issue described by @pceccato is related to #278, because there is no additional wrapper (like std::function) used around handler created using asio::strand::wrap or asio::bind_executor.

@pceccato, both things you described in the last post should work if I understand source code of asio::strand correctly. I checked Boost.Asio from Boost 1.74.0 which comes with Asio 1.18.0. Is it possible to provide working code reproducing the issue (including preprocessor definitions and compiler options you use to build this code)? I'm afraid that code you provided is simplified version of your production code and this simplification may miss important details.

pceccato commented 4 years ago

Yeah, I don't understand. All I know is the same code compiled against asio standalone 1.16.1 broke with asio 1.18.0. Took me a day to figure it out. The examples in the documentation are lagging behind the current way asio does things, which doesn't help.

If I get a chance I will knock up a simple standalone example. In this case it was just a test case with a server that echoed packets back to the client. And I was able to 100% reproduce the issue if the client set 2000000 bytes to the server.

mabrarov commented 4 years ago

@pceccato, FYI I created test for asio::strand::wrap method here (mabrarov/asio_test). It just posts handlers directly, i.e. doesn't use any socket or SSL stream (yet). Test works well on my environment (Windows 10 x64).

mabrarov commented 4 years ago

I may be wrong but there is a chance that issue is located somewhere in intermediate handlers which should use the same executor (strand) as user defined executor, but they don't do that in cases of usage of asio::strand::wrap. There is a chance that #542 is about the same as this issue is. Maybe asio::detail::wrapped_handler class lacks get_executor method and Asio sockets / streams rely on it instead of deprecated asio_handler_invoke function?