bol-van / zapret

DPI bypass multi platform
5.34k stars 477 forks source link

AutoHostsLists --hostlist-auto-fail-threshold Issue #191

Open asaddon opened 1 month ago

asaddon commented 1 month ago

Hello, I've setup NFQWS with hostlist-auto-fail-threshold=2 setting, but I believe its not working as expected, see the below debug log:

13.07.2024 19:42:19 : www.porntrex.com : tcp retrans threshold reached 13.07.2024 19:42:19 : www.porntrex.com : fail counter 1/2 13.07.2024 19:42:40 : www.porntrex.com : tcp retrans threshold reached 13.07.2024 19:42:40 : www.porntrex.com : fail counter 1/2 13.07.2024 19:43:12 : www.porntrex.com : tcp retrans threshold reached 13.07.2024 19:43:12 : www.porntrex.com : fail counter 1/2 13.07.2024 19:43:32 : www.porntrex.com : tcp retrans threshold reached 13.07.2024 19:43:32 : www.porntrex.com : fail counter 1/2 13.07.2024 19:43:56 : www.porntrex.com : tcp retrans threshold reached 13.07.2024 19:43:56 : www.porntrex.com : fail counter 1/2 13.07.2024 19:44:16 : www.porntrex.com : tcp retrans threshold reached 13.07.2024 19:44:16 : www.porntrex.com : fail counter 1/2 13.07.2024 19:44:36 : www.porntrex.com : tcp retrans threshold reached 13.07.2024 19:44:36 : www.porntrex.com : fail counter 1/2

Why the fail counter 1/2 is not incrementing? Should'nt it move to 2/2 on second "tcp retrans threshold reached" and add the domain in the list?

I'm Running NFQWS on AsusWrt Merlin with this command.

nfqws --daemon --uid 1:1 --pidfile=/tmp/NFQWSdpi.txt --dpi-desync-fwmark=0x40000000 --qnum=200 --dpi-desync=split2 --wssize 1:6 --hostlist-exclude=/opt/zapret/HostsExclude.txt --hostlist-auto=/opt/zapret/AutoHosts.txt --hostlist-auto-fail-threshold=2 --hostlist-auto-fail-time=5 --hostlist-auto-retrans-threshold=2 --hostlist-auto-debug=/opt/zapret/AutoHostsDebug.log

bol-van commented 1 month ago

2 fails must be within 5 seconds as you specified in fail-time option

bol-van commented 1 month ago

Also, you specified --wssize option. It's not controlled by hostlist because wssizing starts during TCP handshake when host is not available yet. wssize option is undesired and should be avoided if other strategies are available

asaddon commented 1 month ago

It's still doing this even with these settings:

nfqws --daemon --uid 1:1 --pidfile=/tmp/NFQWSdpi.txt --dpi-desync-fwmark=0x40000000 --qnum=200 --dpi-desync=split2 --hostlist-exclude=/opt/zapret/HostsExclude.txt --hostlist-auto=/opt/zapret/AutoHosts.txt --hostlist-auto-fail-threshold=2 --hostlist-auto-fail-time=1 --hostlist-auto-retrans-threshold=2 --hostlist-auto-debug=/opt/zapret/AutoHostsDebug.log

14.07.2024 21:10:25 : www.porntrex.com : tcp retrans threshold reached 14.07.2024 21:10:25 : www.porntrex.com : fail counter 1/2 14.07.2024 21:10:48 : www.porntrex.com : tcp retrans threshold reached 14.07.2024 21:10:48 : www.porntrex.com : fail counter 1/2 14.07.2024 21:11:08 : www.porntrex.com : tcp retrans threshold reached 14.07.2024 21:11:08 : www.porntrex.com : fail counter 1/2 14.07.2024 21:12:10 : www.porntrex.com : tcp retrans threshold reached 14.07.2024 21:12:10 : www.porntrex.com : fail counter 1/2 14.07.2024 21:12:31 : www.porntrex.com : tcp retrans threshold reached 14.07.2024 21:12:31 : www.porntrex.com : fail counter 1/2

bol-van commented 1 month ago

Now you set up timeout even less : one second. --hostlist-auto-fail-time=1 What do you expect ?

bol-van commented 1 month ago

fail-time means all fails must be within specified time to trigger hostlist addition you should increase not decrease this time

asaddon commented 1 month ago

fail-time means all fails must be within specified time to trigger hostlist addition

you should increase not decrease this time

Oh so if I want to make it add the hosts within one second of retransmission fail, what value I need to add?

asaddon commented 1 month ago

One more thing I don't understand is why the fail counter not incrementing after the first fail. Why another entry with 1/2 failed attempts.

bol-van commented 1 month ago

Oh so if I want to make it add the hosts within one second of retransmission fail, what value I need to add?

You didn't understand how it works. Retransmission fail happens when retrans counter reaches retrans-threshold. It is a fail event that counts. Not every retransmission counts. It can happen because a website is experiencing temporary troubles. That's why single fail does not trigger hostlist addition. You specify how many fails must happen to trigger. But imagine one fail happens today , second - next week. It will not count as 2 fails because fail-time is exceeded. Our puprose is to detect when fails are happening continuously. You press refresh, and it fails again and again. Normal value for fail-time is 60 seconds, not 1 or 5 second. Because refresh usually takes longer. See your debug log. In your example it's 20-30 seconds.

One more thing I don't understand is why the fail counter not incrementing after the first fail. Why another entry with 1/2 failed attempts.

Because fail-time exceeds and previous attempt is cleared

bol-van commented 1 month ago

See time stamp

15.07.2024 02:16:22 : generalfilm.website : tcp retrans threshold reached
15.07.2024 02:16:22 : generalfilm.website : fail counter 1/3
15.07.2024 03:16:41 : generalfilm.website : tcp retrans threshold reached
15.07.2024 03:16:41 : generalfilm.website : fail counter 1/3
15.07.2024 04:17:00 : generalfilm.website : tcp retrans threshold reached
15.07.2024 04:17:00 : generalfilm.website : fail counter 1/3
15.07.2024 05:17:22 : generalfilm.website : tcp retrans threshold reached
15.07.2024 05:17:22 : generalfilm.website : fail counter 1/3
15.07.2024 06:17:41 : generalfilm.website : tcp retrans threshold reached
15.07.2024 06:17:41 : generalfilm.website : fail counter 1/3
15.07.2024 07:18:00 : generalfilm.website : tcp retrans threshold reached
15.07.2024 07:18:00 : generalfilm.website : fail counter 1/3
15.07.2024 08:54:42 : generalfilm.website : tcp retrans threshold reached
15.07.2024 08:54:42 : generalfilm.website : fail counter 1/3
15.07.2024 08:54:44 : generalfilm.website : tcp retrans threshold reached
15.07.2024 08:54:44 : generalfilm.website : fail counter 2/3
15.07.2024 08:54:45 : generalfilm.website : tcp retrans threshold reached
15.07.2024 08:54:45 : generalfilm.website : fail counter 3/3
15.07.2024 08:54:45 : generalfilm.website : adding
asaddon commented 1 month ago

Ahh that makes sense, thank you for the detailed explanation.

bol-van commented 1 month ago

In last version I improved this logic. Now it resets fail counter if website works. Bad,Bad,Good,Bad within fail-time does not trigger hostlist addition anymore.

asaddon commented 1 month ago

@bol-van One more thing I noticed, my ISP's DPI doesn't always send a RST packet, sometimes it just timed out and do nothing, and in that case the autohostlist addition doesn't work.

Will it be possible for you to add a timeout setting where it just add the host if the server doesn't respond after x number of seconds.

curl --connect-timeout 60 ==max-time 120 youporn.com

408 Request Time-out

Your browser didn't send a complete request in time.

admin@MyAsus:/tmp/home/root# curl --connect-timeout 60 --max-time 120 youporn.com curl: (56) Recv failure: Connection timed out

This is how the DPI behaves sometimes instead of sending RST.

bol-van commented 1 month ago

Timeout is already implemented but not this way you think. For nfqws it's handled by counting retransmissions of TLS ClientHello or plain http request. Time between retransmissions is increased progressively by OS For tpws it's handled another way because tpws is not operating on packet level. Timeout is when client itself drops the connection without receiving anything from the server. And it's common situation in Russia. One of my ISPs behave exactly the same way. And it works for me.

I don't understand 408 code in your post is what "server" returns ? Is it DPI reaction to plain http request ? If it's returned immediately, then it won't work for the current nfqws/tpws. It acts only on 302/307 redirects. If after some time, it depens if client receives ACKs for sent data. If yes retransmission won't happen

Does timeout work for https ?

asaddon commented 1 month ago

I'm just testing it via simple curl in SSH and that 408 error doesn't always comes. I'm attaching a picture.

image

I guess I'll need to fire WireShark and do some extensive testing on what exactly the DPI is doing.

bol-van commented 1 month ago

Yes, wireshark capture will help. Or debug log from nfqws (--debug). I will be gone for 2 weaks without access to PC so won't be able to read .cap soon. But can read plain text

I guess 408 case means DPI blocks outgoing packet with "bad request" without corrupting TCP connection. Web server receives nothing and returns 408 code. I guess it's not the thing all server will return. Try some other servers

asaddon commented 1 month ago

Yeah not all servers do that as as you can see in the picture, and even not every request on same server return 408. I'll do some testing and will update, Thank You.

asaddon commented 1 month ago

@bol-van Here you go, Wireshark .pcap file which includes the 408 error as well, see if you can find anything worth use here. Use filter "ip.addr == 66.254.114.79" to filter out the relevant lines.

408 error.pcap.zip

Also the new reset counter change is causing the counter to reset as it deems the website is working even if it gives a 408 error and it goes into a loop and the site is not being added into the hosts list.

24.07.2024 23:04:57 : lobstertube.com : fail counter 2/3
24.07.2024 23:05:02 : lobstertube.com : fail counter reset. website is working.
asaddon commented 1 month ago

After some more testing the 408 error only occurs if the GET request is done via HTTP, on HTTPS it doesn't give 408 error nor it resets the fail counter by saying website is working.

bol-van commented 1 month ago

nfqws does not treat 408 as dpi block 408 can be caused by other reasons

asaddon commented 1 month ago

Yeah, it's not a big deal, and so far I've only experienced 408 error while doing the manual testing with cURL and not in real-world browsing.

Everything else works as expected, you can close this issue now. 🙂

bol-van commented 4 weeks ago

Now i'm back to my PC. From your capture I can see your DPI silently drops packets containing "bad" request. Server does not receive anything and thus does not send ACKs. Client keeps retransmitting. Some servers are configured to return 408 code if client does not send anything within specific timeout. This is your case.

nfqws retransmission detection should work for http absense of reaction to 408 code is normal. 408 means site works. works = server sends anything meaningful even if its 500 internal server error. 408 IS NOT DPI packet. its generated by the server

asaddon commented 4 weeks ago

Now i'm back to my PC.

Welcome back :)

From your capture I can see your DPI silently drops packets containing "bad" request.

Anything can be done for it? like maybe add a switch to automatically add all such requests to hosts list after x number of seconds.

bol-van commented 4 weeks ago

like I said it should work with normal retransmission detection algorithm does it work in your case ? for plain http

retransmissions is the simpliest way to detect "hangs" or "drops" using only outgoing traffic capturing incoming traffic is a problem on non-linux system. only linux has connbytes limiter in kernel. BSDs and windows do not. capturing incoming data packets there means capture all incoming packets. gigabytes, for example, while you watching your favourite 4k movie. all goes from kernel to user mode for analysis

asaddon commented 4 weeks ago

like I said it should work with normal retransmission detection algorithm

does it work in your case ? for plain http

I'll need to test it again thoroughly, a simple curl test I did earlier for HTTP always throughs 408 error and it didn't even shows anything in debug logs for autohostlist.

I'm on Linux BTW.

bol-van commented 4 weeks ago

You are on an unsupported system AsusWRT. It means you likely do not use zapret startup scripts, redirect traffic and run nfqws manually. What are your iptables commands ?

asaddon commented 4 weeks ago

iptables -t mangle -A INPUT -p tcp -m multiport --sports 80,443 -m connbytes --connbytes 1:8 --connbytes-mode packets --connbytes-dir reply -j NFQUEUE --queue-num 200 --queue-bypass

iptables -t mangle -A FORWARD -p tcp -m multiport --sports 80,443 -m connbytes --connbytes 1:8 --connbytes-mode packets --connbytes-dir reply -j NFQUEUE --queue-num 200 --queue-bypass

iptables -t mangle -A POSTROUTING -p tcp -m multiport --dports 80,443 -m connbytes --connbytes 1:8 --connbytes-mode packets --connbytes-dir original -m mark ! --mark 0x40000000/0x40000000 -j NFQUEUE --queue-num 200 --queue-bypass

I initially used your blockcheck.sh script and used the recommended settings from there and altered it to match everything manually for my setup.

bol-van commented 3 weeks ago

iptables are ok. however better to use interface names with -i/-o options to process only wan interface

asaddon commented 3 weeks ago

Sure I'll do that.

asaddon commented 3 weeks ago

Okay I did multiple tests using cURL on HTTP, it seems zapret is probably not catching/working on HTTP traffic, on all these cURL calls none of the log entries shows in the debug log.

admin@MyAsus:/tmp/home/root# curl http://tube8.com curl: (56) Recv failure: Connection timed out admin@MyAsus:/tmp/home/root# curl http://tube8.com 408 Request Time-out Your browser didn't send a complete request in time. admin@MyAsus:/tmp/home/root# curl http://tube8.com curl: (56) Recv failure: Connection timed out admin@MyAsus:/tmp/home/root# curl http://tube8.com 408 Request Time-out Your browser didn't send a complete request in time. admin@MyAsus:/tmp/home/root# curl http://tube8.com 408 Request Time-out Your browser didn't send a complete request in time. admin@MyAsus:/tmp/home/root# curl http://tube8.com 408 Request Time-out Your browser didn't send a complete request in time. admin@MyAsus:/tmp/home/root# curl http://tube8.com 408 Request Time-out Your browser didn't send a complete request in time. admin@MyAsus:/tmp/home/root# curl http://tube8.com curl: (56) Recv failure: Connection timed out admin@MyAsus:/tmp/home/root# curl http://tube8.com curl: (56) Recv failure: Connection timed out

Also sometimes I get the "408 error" and sometimes "curl: (56) Recv failure: Connection timed out"

And for HTTPS traffic it shows this error and I get entry on the debug logs as well.

admin@MyAsus:/tmp/home/root# curl https://tube8.com curl: (35) Recv failure: Connection timed out admin@MyAsus:/tmp/home/root# curl https://tube8.com curl: (35) Recv failure: Connection timed out

30.07.2024 22:09:18 : tube8.com : tcp retrans threshold reached 30.07.2024 22:09:18 : tube8.com : fail counter 1/3 30.07.2024 22:14:49 : tube8.com : tcp retrans threshold reached 30.07.2024 22:14:49 : tube8.com : fail counter 1/3

asaddon commented 3 weeks ago

I'll do one more test with --debug switch.

asaddon commented 3 weeks ago

Okay the debug switch clears things up, zapret is working for HTTP but since cURL was timing out before zapret hits the tcp retransmission threshold, it was never triggering the fail counter, and that is why I was not seeing anything in the hosts debug log.

Second thing I find out is the iOS is not doing TCP retransmissions properly even on multiple page loads and that is why zapret is having trouble adding the hosts in the list, On Windows chrome It worked perfectly, so that is not a zapret issue.

I'm not sure if I can do anything about the iOS RTO time through my router

bol-van commented 3 weeks ago

cURL was timing out before zapret hits the tcp retransmission threshold

Did you increase retrans threshold value above default 3 ? 3 retrans on linux is less than 2 seconds.

Second thing I find out is the iOS is not doing TCP retransmissions properly

Take capture on router using tcpdump

Also notice that hardware acceleration (offload) can remove some traffic from linux netfilter route. On some stocks it can be disabled

asaddon commented 3 weeks ago

Did you increase retrans threshold value above default 3 ? 3 retrans on linux is less than 2 seconds.

Yup, increased it to 5 for testing few days back, now set it back to 2.

Take capture on router using tcpdump Also notice that hardware acceleration (offload) can remove some traffic from linux netfilter route. On some stocks it can be disabled

I'll check it out, thank You.

asaddon commented 3 weeks ago

Okay I took three different dumps directly on router using Tshark, iOS, Android and Windows Chrome, on iOS as per the dump it does do the tcp retransmissions fine but for some reason its not showing in Zapret host debug log nor its adding the host automatically, on both Android and Chrome, it shows in logs and the host is succefully added in the hostslist.

I'm attaching everything for you to take a close look as I have no idea what iOS is doing differently here.

Pcaps.zip

bol-van commented 3 weeks ago

Ios does not do any retransmissions of plain HTTP request within 15 second 408 code timeout. That's why nfqws does not detect anything. It cannot detect what is absent. It doesn't look like normal os behavior. I'm not apple expert.

In 2 others cases things are also interesting. Port 80 is a standard situation that would happen if "GET /" packet is dropped by the DPI. But for port 443 things are more interesting. All OSes do normal request retransmission of TLS ClientHello. But the server also retransmits SYN,ACK segment. It means it received SYN from client but did not receive ACK to its SYN,ACK. 3-way handshake does not complete from the server's view and does complete from the client's view. I can only guess what makes DPI to drop also ACK to SYN,ACK. It's a part of tcp handshake. SNI is not known yet. It may indicatie an IP block or DPI delays (buffers) more that one packet from the client and starts doing this on early stage. May be it's waiting to reassemble full TLS client hello from multiple tcp segments. If it detects bad host it drops the whole buffer. All TCP segments with TLS ClientHello and also 3-rd packet of TCP handshake (ACK to SYN,ACK)

asaddon commented 3 weeks ago

Yeah I'm just doing the tests to undertsand it and report to you incase it may help you in some way. Everything works perfectly as soon as I manually add the hosts in the list so DPI bypass is working as expected.