chriskohlhoff / asio

Asio C++ Library
http://think-async.com/Asio
4.96k stars 1.22k forks source link

async_read_until seems to read at least buffer size if delimiter occurs before #972

Open Thomas1664 opened 2 years ago

Thomas1664 commented 2 years ago

I use async_read_until with asio::streambuf as buffer. According to documentation, this function reads everything until the delimiter occurs. However, if the delimiter is the 3rd byte, this function reads 4 bytes and until 1 byte after the delimiter. Interestingly, the handler never executes if I use std::vector<char> as buffer. This leads to the conclusion that async_read_until tries to fill the buffer in the first place and after that it tries to find the delimiter. But this behaviour is not documented and not how this function is supposed to work.

Furthermore, I use async_read_until in combination with an SSL socket to read a header that contains the length of the body because the size of the messages is variable. How do I read the rest of the contents? Aren't they already kind of "read" by async_read_until?

Maximsiv1410 commented 2 years ago

Hi async_read_until is composed asynchronous operation which makes multiple calls to async_read_some I dont know for sure, but I can assume that it firstly reads 'some' available bytes from socket(e.g 4 bytes) and then scans latter read bytes for your delimeter. There is just no sence to read input stream byte after byte, checking for your delimeter occurence since it would drastically decrease performance I'll try to check my assumption later in the source code

mabrarov commented 2 years ago

Hi @Thomas1664,

According to documentation, this function reads everything until the delimiter occurs. However, if the delimiter is the 3rd byte, this function reads 4 bytes and until 1 byte after the delimiter. ... This leads to the conclusion that async_read_until tries to fill the buffer in the first place and after that it tries to find the delimiter. But this behaviour is not documented and not how this function is supposed to work.

Isn't behavior which you reported documented?

https://www.boost.org/doc/libs/1_78_0/doc/html/boost_asio/reference/async_read_until/overload5.html#boost_asio.reference.async_read_until.overload5.remarks

Remarks

After a successful async_read_until operation, the streambuf may contain additional data beyond the delimiter. An application will typically leave that data in the streambuf for a subsequent async_read_until operation to examine.

Regarding:

Interestingly, the handler never executes if I use std::vector<char> as buffer.

Could you please provide (not)working example?

Thank you.

Thomas1664 commented 2 years ago

Regarding the first part: This piece of documentation is hard to find in the Boost-documentation and completely missing in the non-Boost documentation.

Could you please provide (not)working example?

Using your SSL example: Change buffer to dynamic_buffer and remove the length parameter https://github.com/chriskohlhoff/asio/blob/f70f65ae54351c209c3a24704624144bfe8e70a3/asio/src/examples/cpp11/ssl/client.cpp#L116

Change the type of reply_ to std::vector<char> or std::string https://github.com/chriskohlhoff/asio/blob/f70f65ae54351c209c3a24704624144bfe8e70a3/asio/src/examples/cpp11/ssl/client.cpp#L134

Place a breakpoint inside the handler, e.g. at this line: https://github.com/chriskohlhoff/asio/blob/f70f65ae54351c209c3a24704624144bfe8e70a3/asio/src/examples/cpp11/ssl/client.cpp#L119

Notice that the breakpoint is never hit after typing a message in. This seems to be independent of the length of the message. I think the bug occurs because both types are optimised so that they always allocate a bit more space than actually needed (if they need to resize). If I remember correctly, the behaviour of async_read is to fill the buffer until it's full if no maximum size is given, but this will never happen because the buffer always becomes resized.

mabrarov commented 2 years ago

Hi @Thomas1664,

Regarding:

This piece of documentation is hard to find in the Boost-documentation

What about Boost.Asio. OverviewLine-Based Operations?

The streambuf data member serves as a place to store the data that has been read from the socket before it is searched for the delimiter. It is important to remember that there may be additional data after the delimiter. This surplus data should be left in the streambuf so that it may be inspected by a subsequent call to read_until() or async_read_until().

Regarding:

This piece of documentation is ... completely missing in the non-Boost documentation.

I was ensured that Boost.Asio documentation and non-Boost Asio documentation are built from the same sources:

https://think-async.com/Asio/asio-1.20.0/doc/asio/reference/async_read_until/overload6.html#asio.reference.async_read_until.overload6.remarks:

Remarks

After a successful async_read_until operation, the streambuf may contain additional data beyond the delimiter. An application will typically leave that data in the streambuf for a subsequent async_read_until operation to examine.

Just wanted to understand what's wrong with documentation and if there is a need for additional clarification (highlighting).

Regarding your example with dynamic_buffer - it seems to be interesting. At least, this case could be documented (and this case really seems to be a bug having some workarounds). What if template<typename Elem, typename Allocator> dynamic_vector_buffer<Elem, Allocator> dynamic_buffer(std::vector<Elem, Allocator> & data, std::size_t max_size) is used?

Thomas1664 commented 2 years ago

Adding the information about the behaviour of async_read_until to the functions' main page and not just in an overload would make it easier to find it.

Thomas1664 commented 2 years ago

Regarding your example with dynamic_buffer - it seems to be interesting. At least, this case could be documented (and this case really seems to be a bug having some workarounds). What if template<typename Elem, typename Allocator> dynamic_vector_buffer<Elem, Allocator> dynamic_buffer(std::vector<Elem, Allocator> & data, std::size_t max_size) is used?

If max_size is not bigger than the size of the actual message/ data (in other words: if the message is not smaller than max_size), it solves the issue but you loose the dynamic as well which is not the intention why to use dynamic_buffer. You simply could have used asio::buffer.