MortezaBashsiz / nipovpn

Powerfull http proxy
GNU General Public License v3.0
174 stars 10 forks source link

Find a reliable way to read from socket till end of message #62

Open MortezaBashsiz opened 2 months ago

MortezaBashsiz commented 2 months ago

Subjective

In an HTTPS call, the client detects the end of a message using Content-Length Header The Content-Length header specifies the exact byte length of the HTTP message body. The client reads the response until it has received the specified number of bytes. How it works:

We as a proxy server are not able to decrypt and detect the Content-Length header, so we need to find a way to detect when the message is over.

Current solution

I implemented a way to read till end of message like the following, which is not stable. You can check the function TCPClient::doRead which is reading the socket in a loop and check the available bytes on socket. I also wait at the end of each round in the loop to give a pause for possible data over socket. Both repeatWait and timeWait used in this function are configurable from config file

    for (auto i = 0; i <= config_->general().repeatWait; i++) {
      while (true) {
        if (!socket_.is_open()) {
          log_->write("[TCPClient doRead] Socket is not OPEN",
                      Log::Level::DEBUG);
          socket_.close();
          return;
        }
        if (socket_.available() == 0) break;
        boost::asio::read(socket_, readBuffer_,
                          boost::asio::transfer_exactly(1), error);
        if (error == boost::asio::error::eof) {
          log_->write("[TCPClient doRead] [EOF] Connection closed by peer.",
                      Log::Level::TRACE);
          socket_.close();
          return;  // Exit after closing the socket
        } else if (error) {
          log_->write(
              std::string("[TCPClient doRead] [error] ") + error.message(),
              Log::Level::ERROR);
          socket_.close();
          return;  // Exit after closing the socket
        }
      }
      timer.expires_from_now(
          boost::posix_time::milliseconds(config_->general().timeWait));
      timer.wait();
    }

Current Issues

We have two basic issues with the solution above

  1. This method is not reliable since the data may come with delay on socket, then our loop will end before reading all data from socket. So the parameter socket_.available() is not reliable in this situation

  2. This method will read from socket while there is something on socket to read and then pack all red data and send it to the nipoAgent. So imagine I want to download a file with size of 100Mb, then following steps will be happened

This is not good and client needs to get the data chunk by chunk and not all once

SirzechsLucifer666 commented 2 months ago

Just for clarification. Is there any reason we are not using Beast for network?

SirzechsLucifer666 commented 2 months ago

Okay, I watched your YouTube video, and I understand now why you need a custom implementation. I had a slightly different understanding of the design in mind.

SirzechsLucifer666 commented 2 months ago

To fix all the issues, we need to make some minor design changes.

We should use websocket for agent-server communication instead of HTTP. Streaming through multiple HTTP requests has a significant overhead, especially with large files, unless we buffer the data to keep request counts low, which is inefficient. If you prefer to use HTTP, we have a couple of options: either make it a dynamic choice between ws and HTTP based on request size, or make it optional through configuration and let the user decide. For each request in the agent, we must read the header first. A function like async_read_until(s, buffer, "\r\n\r\n", callback) can help with this. We then populate a Metadata object for the request, including information that the server needs to forward the data, such as the destination IP and request size. Then we establish a ws connection to the server and send the header and metadata to initiate the connection. At this point, we continuously read data from the socket and send it immediately to the server to avoid buffering and reduce latency until it's done.

I haven't fully considered this yet, but I'd like to hear your thoughts.

MortezaBashsiz commented 2 months ago

All the logic behind this idea is to hide the original request as much as possible and make it the same as HTTP request and response.

Websocket is simply detectable, and It is not a good idea to use it. About making it optional for user, it is good idea but let's focus on functionality of current method then we can implement it as a new feature later.

Using async_read_until will not work since the HTTPS is not like HTTP and FTP which ends with "\r\n\r\n". I used this method at the beginning between nipoServer and nipoAgent but we do not have problem between them since we can control everything in the middle. Our problem with reading data is between nipoServer and the Origin, which is a mystery for us since it is encrypted. For more information, you can check THIS question that I asked months ago.

TBH, I am not sure if we will succeed at the end or not, but I would always like to try all I can.

SirzechsLucifer666 commented 2 months ago

I ran the example you gave in your SO question as a reference. As expected it stuck but I checked the buffer.

CONNECT www.google.com:443 HTTP/1.1\r\nHost: www.google.com:443\r\nUser-Agent: curl/8.5.0\r\nProxy-Connection: Keep-Alive\r\n\r\n

the \r\n\r\n thingy exists. am I missing something?

SirzechsLucifer666 commented 2 months ago

Man, I hate network. I was avoiding this topic for a long time

I managed to pass the connection phase and I got into your issue in ClientHello at last. :)

boost::system::error_code error;
auto size = boost::asio::read_until(socket_, readBuffer_, "\r\n\r\n", error);
std::ostream o(&writeBuffer_);
o << "HTTP/1.1 200 Connection established \r\n\r\n";
boost::asio::write(socket_, writeBuffer_);
doRead_client_hello();

Well to address your problem third & fourth bytes of the handshake is bytes of handshake message

This is the first few bytes of the handshake

Screenshot_2024-08-30_01-37-14

If you look at the byte number 3 & 4 it's 0x0200 which is 512 in decimal tho I don't know why the length is 518 I think it suppose to be 517 or I'm tripping

MortezaBashsiz commented 2 months ago

If you run the program with loglevel DEBUG, it will show you all the hexadecimal in String and some useful information

2024-08-30_08:45:46 [AGENT] [TRACE] [Read from] [SRC 127.0.0.1:41078] [Bytes 517] 
2024-08-30_08:45:46 [AGENT] [DEBUG] [AgentHandler handle] [Token Valid]
2024-08-30_08:45:46 [AGENT] [DEBUG] [AgentHandler handle] [Request] : 
TLS Type : TLSHandshake
SNI : speed.cloudflare.com
Body Size : 1034
Body : 1603010200010001fc03035ee3a44685cd20b00d1a200f8330979ca47bcb5f92ee67bde284c1e2fe85a057206a8a8db412f99673811ba3c5542fe9c682cd420a3424a3523fd9ab737cbd1675003e130213031301c02cc030009fcca9cca8ccaac02bc02f009ec024c028006bc023c0270067c00ac0140039c009c0130033009d009c003d003c0035002f00ff0100017500000019001700001473706565642e636c6f7564666c6172652e636f6d000b000403000102000a00160014001d0017001e00190018010001010102010301040010000e000c02683208687474702f312e31001600000017000000310000000d0030002e04030503060308070808081a081b081c0809080a080b080408050806040105010601030303010302040205020602002b00050403040303002d00020101003300260024001d002029cec1605db1d492f60a78d9dce170db73dfc2f38b6aee1d836f319e7d80df69001500a70000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

For the size differences that you see, there are 5 bytes in the header, which are the followings: 16 03 01 02 00 And the rest are related to client hello which is 512 bytes So in total you read 517 bytes(5+512) But to be honest, I don't know why it is 518 in your case.

SirzechsLucifer666 commented 2 months ago

Doesn't this solve the problem? knowing the size of the incoming packet.

MortezaBashsiz commented 2 months ago

You can parse, and know it only for that message, and it is not telling us how many bytes are coming in total. The problem is that we don't know how many messages will come later. Imagine the case that you want to download a picture. In this case

  1. You send the ClientHello
  2. You receive the ServerHello and rest
  3. You send rest and handshake is done
  4. you send the request
  5. you receive the response data
  6. you will continue to read till you reach the Content-Length And here is the point that we don't know how many bytes in how many messages we supposed to receive.

On the other hand, we have a situation in downloading big files, which the packets are not in ssl format, and they are Continuing data

image

AkramiPro commented 1 month ago

درود وقت بخیر متاسفانه من تسلط و تجربه کافی در زمینه شبکه ندارم. برای همین شاید سوالم احمقانه باشه برنامه شما اگر یک tcp 4 layer proxy باشه آیا اصلا نیازی به دونستن حجم بسته ها هست ؟ شاید بررسی نحوه هندل کردن این موضوع در برنامه های دیگه مثل haproxy بتونه کمک کنه تا جایی که فهمیدم میشه از event loop و poll برای مانیتور کردن سوکت استفاده کرد تا بتونید متوجه بشید دیتای جدید روی سوکت اومده یا نه و بعد فروارد کنید

MortezaBashsiz commented 1 month ago

@AkramiPro

If I got your idea correctly, you are talking about TCP/IP system (layer 4 application). About how something like HAproxy works it is like this

  1. For HTTP/HTTPS (Layer 7), HAProxy detects the end of the message using HTTP headers like Content-Length, Transfer-Encoding: chunked, or by detecting a connection close.
  2. For TCP (Layer 4), HAProxy detects the end of a message based on the TCP connection state (e.g., by receiving a FIN packet indicating that the connection is closed).
  3. If handling HTTPS directly (e.g., for SSL termination), HAProxy decrypts the traffic and uses standard HTTP message parsing to detect the end of the message.

There is a good approach which is read by chunk size, and I am working on this method. In chunked transfer encoding, the message body is sent in a series of chunks, with each chunk preceded by its size. HAProxy reads the chunks until it encounters the terminating "0" chunk, which signals the end of the message.

AkramiPro commented 1 month ago

دقیقا تمامی مواردی که گفتید درست هست. سوال من اینه که الان شدنی هست که تا جای ممکن از tcp sate استفاده کنید؟ چون در غیر اینصورت باید تک تک پروتکل ها رو پیاده سازی کنید و پشتیبانی جدا براش ایجاد کنید درسته ؟ مثلا ws ftp ssh و ... هر کدوم چالش های خودشون رو دارن اگر اشتباه نکنم مشکل اصلی سر udp باشه که state less هست و فکر کنم این راه حل هایی که الان دارید میزنید بیشتر به درد اونجا بخوره درسته ؟

یه نکته دیگه هم که به ذهنم رسید و توی این ایشو دیدم مقاوم بودن در برابر reply attack هست . شاید یکم زود باشه ولی گفتم الان بگم توی todo قرار بدید که روی اینم بعدا کار بشه

MortezaBashsiz commented 1 month ago

@AkramiPro

What you mean from tcp state? How could it help us to detect end of message from socket?

We only want to work on HTTP protocol and nothing else (ws, ftp, ssh).

MortezaBashsiz commented 3 weeks ago

The second part related to read data chunk by chunk is separated to another issue #114