Open MortezaBashsiz opened 2 months ago
Just for clarification. Is there any reason we are not using Beast for network?
Okay, I watched your YouTube video, and I understand now why you need a custom implementation. I had a slightly different understanding of the design in mind.
To fix all the issues, we need to make some minor design changes.
We should use websocket for agent-server communication instead of HTTP.
Streaming through multiple HTTP requests has a significant overhead, especially with large files, unless we buffer the data to keep request counts low, which is inefficient.
If you prefer to use HTTP, we have a couple of options: either make it a dynamic choice between ws and HTTP based on request size, or make it optional through configuration and let the user decide.
For each request in the agent, we must read the header first.
A function like async_read_until(s, buffer, "\r\n\r\n", callback)
can help with this.
We then populate a Metadata object for the request, including information that the server needs to forward the data, such as the destination IP and request size.
Then we establish a ws connection to the server and send the header and metadata to initiate the connection.
At this point, we continuously read data from the socket and send it immediately to the server to avoid buffering and reduce latency until it's done.
I haven't fully considered this yet, but I'd like to hear your thoughts.
All the logic behind this idea is to hide the original request as much as possible and make it the same as HTTP request and response.
Websocket is simply detectable, and It is not a good idea to use it. About making it optional for user, it is good idea but let's focus on functionality of current method then we can implement it as a new feature later.
Using async_read_until will not work since the HTTPS is not like HTTP and FTP which ends with "\r\n\r\n". I used this method at the beginning between nipoServer and nipoAgent but we do not have problem between them since we can control everything in the middle. Our problem with reading data is between nipoServer and the Origin, which is a mystery for us since it is encrypted. For more information, you can check THIS question that I asked months ago.
TBH, I am not sure if we will succeed at the end or not, but I would always like to try all I can.
I ran the example you gave in your SO question as a reference. As expected it stuck but I checked the buffer.
CONNECT www.google.com:443 HTTP/1.1\r\nHost: www.google.com:443\r\nUser-Agent: curl/8.5.0\r\nProxy-Connection: Keep-Alive\r\n\r\n
the \r\n\r\n
thingy exists.
am I missing something?
Man, I hate network. I was avoiding this topic for a long time
I managed to pass the connection phase and I got into your issue in ClientHello
at last. :)
boost::system::error_code error;
auto size = boost::asio::read_until(socket_, readBuffer_, "\r\n\r\n", error);
std::ostream o(&writeBuffer_);
o << "HTTP/1.1 200 Connection established \r\n\r\n";
boost::asio::write(socket_, writeBuffer_);
doRead_client_hello();
Well to address your problem third & fourth bytes of the handshake is bytes of handshake message
This is the first few bytes of the handshake
If you look at the byte number 3 & 4 it's 0x0200
which is 512 in decimal
tho I don't know why the length is 518 I think it suppose to be 517 or I'm tripping
If you run the program with loglevel DEBUG, it will show you all the hexadecimal in String and some useful information
2024-08-30_08:45:46 [AGENT] [TRACE] [Read from] [SRC 127.0.0.1:41078] [Bytes 517]
2024-08-30_08:45:46 [AGENT] [DEBUG] [AgentHandler handle] [Token Valid]
2024-08-30_08:45:46 [AGENT] [DEBUG] [AgentHandler handle] [Request] :
TLS Type : TLSHandshake
SNI : speed.cloudflare.com
Body Size : 1034
Body : 1603010200010001fc03035ee3a44685cd20b00d1a200f8330979ca47bcb5f92ee67bde284c1e2fe85a057206a8a8db412f99673811ba3c5542fe9c682cd420a3424a3523fd9ab737cbd1675003e130213031301c02cc030009fcca9cca8ccaac02bc02f009ec024c028006bc023c0270067c00ac0140039c009c0130033009d009c003d003c0035002f00ff0100017500000019001700001473706565642e636c6f7564666c6172652e636f6d000b000403000102000a00160014001d0017001e00190018010001010102010301040010000e000c02683208687474702f312e31001600000017000000310000000d0030002e04030503060308070808081a081b081c0809080a080b080408050806040105010601030303010302040205020602002b00050403040303002d00020101003300260024001d002029cec1605db1d492f60a78d9dce170db73dfc2f38b6aee1d836f319e7d80df69001500a70000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
For the size differences that you see, there are 5 bytes in the header, which are the followings:
16 03 01 02 00
And the rest are related to client hello which is 512 bytes
So in total you read 517 bytes(5+512)
But to be honest, I don't know why it is 518 in your case.
Doesn't this solve the problem? knowing the size of the incoming packet.
You can parse, and know it only for that message, and it is not telling us how many bytes are coming in total. The problem is that we don't know how many messages will come later. Imagine the case that you want to download a picture. In this case
Content-Length
And here is the point that we don't know how many bytes in how many messages we supposed to receive.On the other hand, we have a situation in downloading big files, which the packets are not in ssl format, and they are Continuing data
درود وقت بخیر متاسفانه من تسلط و تجربه کافی در زمینه شبکه ندارم. برای همین شاید سوالم احمقانه باشه برنامه شما اگر یک tcp 4 layer proxy باشه آیا اصلا نیازی به دونستن حجم بسته ها هست ؟ شاید بررسی نحوه هندل کردن این موضوع در برنامه های دیگه مثل haproxy بتونه کمک کنه تا جایی که فهمیدم میشه از event loop و poll برای مانیتور کردن سوکت استفاده کرد تا بتونید متوجه بشید دیتای جدید روی سوکت اومده یا نه و بعد فروارد کنید
@AkramiPro
If I got your idea correctly, you are talking about TCP/IP system (layer 4 application). About how something like HAproxy works it is like this
There is a good approach which is read by chunk size, and I am working on this method. In chunked transfer encoding, the message body is sent in a series of chunks, with each chunk preceded by its size. HAProxy reads the chunks until it encounters the terminating "0" chunk, which signals the end of the message.
دقیقا تمامی مواردی که گفتید درست هست. سوال من اینه که الان شدنی هست که تا جای ممکن از tcp sate استفاده کنید؟ چون در غیر اینصورت باید تک تک پروتکل ها رو پیاده سازی کنید و پشتیبانی جدا براش ایجاد کنید درسته ؟ مثلا ws ftp ssh و ... هر کدوم چالش های خودشون رو دارن اگر اشتباه نکنم مشکل اصلی سر udp باشه که state less هست و فکر کنم این راه حل هایی که الان دارید میزنید بیشتر به درد اونجا بخوره درسته ؟
یه نکته دیگه هم که به ذهنم رسید و توی این ایشو دیدم مقاوم بودن در برابر reply attack هست . شاید یکم زود باشه ولی گفتم الان بگم توی todo قرار بدید که روی اینم بعدا کار بشه
@AkramiPro
What you mean from tcp state
? How could it help us to detect end of message from socket?
We only want to work on HTTP protocol and nothing else (ws, ftp, ssh).
The second part related to read data chunk by chunk is separated to another issue #114
Subjective
In an HTTPS call, the client detects the end of a message using Content-Length Header The Content-Length header specifies the exact byte length of the HTTP message body. The client reads the response until it has received the specified number of bytes. How it works:
We as a proxy server are not able to decrypt and detect the Content-Length header, so we need to find a way to detect when the message is over.
Current solution
I implemented a way to read till end of message like the following, which is not stable. You can check the function TCPClient::doRead which is reading the socket in a loop and check the available bytes on socket. I also wait at the end of each round in the loop to give a pause for possible data over socket. Both
repeatWait
andtimeWait
used in this function are configurable from config fileCurrent Issues
We have two basic issues with the solution above
This method is not reliable since the data may come with delay on socket, then our loop will end before reading all data from socket. So the parameter
socket_.available()
is not reliable in this situationThis method will read from socket while there is something on socket to read and then pack all red data and send it to the nipoAgent. So imagine I want to download a file with size of 100Mb, then following steps will be happened
This is not good and client needs to get the data chunk by chunk and not all once