Inconsistent / incorrect ROBOT attack results (2nd issue)

famzah commented 2 years ago

I'm experiencing the same inconsistency in the results as in issue #1107, when repeatedly testing the same Debian server.

Here is an example for ports 465 (SSL-only) and 587 (STARTTLS):

# IP address of the server is 192.252.146.33

famzah@vbox64:~/testssl$ for i in {1..20} ; do ./testssl.sh --robot famzah.net:465|grep " ROBOT" ; done
 ROBOT                                     not vulnerable (OK)
 ROBOT                                     VULNERABLE (NOT ok)
 ROBOT                                     VULNERABLE (NOT ok) - weakly vulnerable as the attack would take too long

famzah@vbox64:~/testssl$ for i in {1..20} ; do ./testssl.sh --robot --starttls smtp famzah.net:587|grep " ROBOT" ; done
 ROBOT                                     not vulnerable (OK)
 ROBOT                                     VULNERABLE (NOT ok)
 ROBOT                                     VULNERABLE (NOT ok) - weakly vulnerable as the attack would take too long

I cannot reproduce this on port 25 (STARTTLS) where I consistently get "not vulnerable (OK)". This is the expected result as the server is using the patched OpenSSL by Debian. The server is running Debian Buster and OpenSSL is version "1.1.1d-0+deb10u7". Note: ports 465 & 587 are served by a different server implementation compared to the server for port 25. I cannot disclose more about the implementations.

You should be able to reproduce on your end easily. It takes a couple of attempts to get different results. Usually in less than 10 attempts you will encounter a different result.

It you can't reproduce, I have saved the screen output and the "/tmp" dir content for both "not vulnerable (OK)" and "VULNERABLE (NOT ok)" runs executed with debug=3. If you need the saved data, I will attach it here.

The question is -- Is "testssl.sh" encountering a bug and reports inconsistent results, or is it the server for ports 465 & 587 which does something weird and really is vulnerable.

Thank you.

Here is my environment:

famzah@vbox64:~/testssl$ ./testssl.sh -b 2>/dev/null | grep "from"
    testssl.sh       3.0.6 from https://testssl.sh/

famzah@vbox64:~/testssl$ ./testssl.sh -b 2>/dev/null | grep -A3 OpenSSL
 Using "OpenSSL 1.1.1f  31 Mar 2020" [~79 ciphers]
 on vbox64:/usr/bin/openssl
 (built: "Aug 23 17:02:39 2021", platform: "debian-amd64")

# maybe related to: https://github.com/drwetter/testssl.sh/issues/1119
famzah@vbox64:~/testssl$ ./testssl.sh --robot --starttls smtp famzah.net:587|head -n2|tail -n1
No engine or GOST support via engine with your /usr/bin/openssl

famzah@vbox64:~/testssl$ lsb_release -a
Description:    Ubuntu 20.04.3 LTS

famzah@vbox64:~/testssl$ uname -a
Linux vbox64 5.4.0-92-generic #103-Ubuntu SMP Fri Nov 26 16:13:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

drwetter commented 2 years ago

As far as I am concerned: this check was relatively robust the past. IIRC #1107 was something different.

Did a few tests on the target you were providing. Here are somewhat intermediate findings.

First of all it seems for me 85-95% of the tests returned "not vulnerable (OK)". Never encountered "VULNERABLE (NOT ok) - weakly vulnerable" so far.

Then when I tested from a North American host I never got "VULNERABLE (NOT ok)", other than when I checked from Europe. The target IP is located in the US.

Given that and as an educated guess: it could be a networking timing issue. But as said my statistics aren't good yet. It seems that I can't do >21 checks in a row.

@dcooper16 : do you have any idea?

dcooper16 commented 2 years ago

My memory of how this test works isn't very good. (I wrote it more than 4 years ago, and I was just implementing a test method developed by others.) However, I tried testing the server to see if I could figure out what was happening, and got "not vulnerable (OK)" every time. If the inconsistent results don't happen if testing from North America, that would explain why I didn't see it.

Seeing the screen output from "not vulnerable (OK)" and "VULNERABLE (NOT ok)" runs executed with debug=3 may be helpful. It would at least be a starting point.

famzah commented 2 years ago

You can test to the following hosts which have an identical setup:

195.8.222.36   : located in Europe
192.252.146.33 : located in the USA
116.251.204.34 : located in HK

Today I launched three VPS instances @ Digital Ocean from the following locations:

FRA: Frankfurt, Germany (close to the EU server)
NYC: New York, USA (super close to the US server)
SGP: Singapore (close to the HK server)

Here are the tests results:

FRA - 195.8.222.36   ( 28 ms): total=40 ; ok=40 ; weakly= 0 ; not_OK= 0
FRA - 192.252.146.33 ( 84 ms): total=40 ; ok=39 ; weakly= 0 ; not_OK= 1
FRA - 116.251.204.34 (180 ms): total=40 ; ok= 5 ; weakly= 5 ; not_OK=30

NYC - 192.252.146.33 (  8 ms): total=40 ; ok=40 ; weakly= 0 ; not_OK= 0
NYC - 195.8.222.36   (133 ms): total=40 ; ok=11 ; weakly=12 ; not_OK=17
NYC - 116.251.204.34 (210 ms): total=40 ; ok= 8 ; weakly= 8 ; not_OK=24

SGP - 116.251.204.34 ( 35 ms): total=40 ; ok=40 ; weakly= 0 ; not_OK= 0
SGP - 195.8.222.36   (205 ms): total=40 ; ok=11 ; weakly=16 ; not_OK=13
SGP - 192.252.146.33 (294 ms): total=40 ; ok= 6 ; weakly= 8 ; not_OK=26

I've done a total of 40 attempts to port 465 from each VPS location to each server. Each test result line shows the following:

VPS_LOCATION - TESTED_SERVER (ping RTT): total tests; "not vulnerable (OK)"; "VULNERABLE (NOT ok) - weakly vulnerable"; "VULNERABLE (NOT ok)" results

My black-box testing definitely shows that network latency plays a vital role in the consistency of the results.

Here is my environment:

root@debian-s-1vcpu-1gb-fra1-01:~/testssl.sh# ./testssl.sh -b 2>/dev/null | grep "from" # directly "git clone'd"
    testssl.sh       3.1dev from https://testssl.sh/dev/

root@debian-s-1vcpu-1gb-fra1-01:~/testssl.sh# ./testssl.sh -b 2>/dev/null | grep -A3 OpenSSL
 Using "OpenSSL 1.0.2-chacha (1.0.2k-dev)" [~179 ciphers]
 on debian-s-1vcpu-1gb-fra1-01:./bin/openssl.Linux.x86_64
 (built: "Jan 18 17:12:17 2019", platform: "linux-x86_64")

# no warning for "No engine or GOST support via engine with your /usr/bin/openssl"

root@debian-s-1vcpu-1gb-fra1-01:~/testssl.sh# lsb_release -a
Description:    Debian GNU/Linux 11 (bullseye)

root@debian-s-1vcpu-1gb-fra1-01:~/testssl.sh# uname -a
Linux debian-s-1vcpu-1gb-fra1-01 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux

famzah commented 2 years ago

We need to clarify the following:

[ ] Is this a bug in the ROBOT test and is it fixable?
[ ] Is it recommended that we run the tests from a LAN location thus eliminating any network latencies, in order for all "testssl.sh" results to be reliable?
[ ] If the ROBOT test is hard to get fixed, maybe we could run a "ping" test before it, and issue a warning if the latency is bigger than 40 ms?

drwetter commented 2 years ago

Oh, great, thanks for the input!

I did 20x runs from Europe against HK with https://github.com/robotattackorg/robot-detect, which was originally used by @dcooper as a basis for porting this to testssl.sh .

The result is most often "Getting inconsistent results, aborting.", and some ",NOT VULNERABLE". Never vulnerable.

Interesting was in Hanno's/Tibor's/... python code the segment when "Getting inconsistent results, aborting." is being printed:

if (oracle_good != oracle_good_verify) or (oracle_bad1 != oracle_bad_verify1) or (oracle_bad2 != oracle_bad_verify2) or (oracle_bad3 != oracle_bad_verify3) or (oracle_bad4 != oracle_bad_verify4):
    if not args.quiet:
        print("Getting inconsistent results, aborting.")

Probably I understand either codes not good enough but

          # If the server provided the same error message for all tests, then this
          # is an indication that the server is not vulnerable.
          if [[ "${response[0]}" != "${response[1]}" ]] || [[ "${response[1]}" != "${response[2]}" ]] || \
             [[ "${response[2]}" != "${response[3]}" ]] || [[ "${response[3]}" != "${response[4]}" ]]; then
               vulnerable=true

Don't know whether this is the same as above (doubt that) but the comment here doesn't match the assignment of vulnerable=true and if there are different responses I would also expect a message like not sure.

David, can you check?

drwetter commented 2 years ago

robot-detect seems more robust but my statistics weren't good enough. Running robot-detect from Europe --> NY yielded 2x vulnerable, 2x not vulnerable, 16x 'Getting inconsistent results, aborting'. For testssl I got worse results than you (3x vulnerable, 2x weakly vulnerable, 1x OK, then I stopped)

So far I believe the check with both tools has problems, however it looks like testssl.sh can do better.

dcooper16 commented 2 years ago

Hi @drwetter, @famzah,

I don't have much time to look into this today (maybe late next week), but I tried testing against 195.8.222.36, since it seemed to have the most inconsistent results when testing from North America. I ran the test using

testssl.sh --debug 3 --robot --ip 195.8.222.36 famzah.net:465

and using script to store the results in a file.

Of the 15 tests I ran, only one came back "not vulnerable (OK)":

> egrep -i "response\[|vulnerable" results.txt 
response[0] = 15030300020214
response[1] = 15030300020214
response[2] = 15030300020214
response[3] = 15030300020214
response[4] = 15030300020214
response[0] = Timeout waiting for alert
response[1] = Timeout waiting for alert
response[2] = Timeout waiting for alert
response[3] = Timeout waiting for alert
response[4] = Timeout waiting for alert
not vulnerable (OK)

The other tests all failed in the first iteration of testing. Some examples are:

> egrep -i "response\[|vulnerable" results.txt
response[0] = 15030300020214
response[1] = 15030300020214
response[2] = 15030300020214
response[3] = 15030300020214
response[4] = 1503030002021434353420544
VULNERABLE (NOT ok) - weakly vulnerable as the attack would take too long

> egrep -i "response\[|vulnerable" robot/test1.txt 
response[0] = 15030300020214
response[1] = 15030300020214
response[2] = 1503030002021434353420544
VULNERABLE (NOT ok)

> egrep -i "response\[|vulnerable" results.txt 
response[0] = 1503030002021434353420544
response[1] = 15030300020214
response[2] = 1503030002021434353420544
VULNERABLE (NOT ok)

> egrep -i "response\[|vulnerable" results.txt
response[0] = 15030300020214
response[1] = 15030300020214
response[2] = 15030300020214
response[3] = 1503030002021434353420544
VULNERABLE (NOT ok)

In every case, the server either responded with "15030300020214" or "1503030002021434353420544". The ROBOT test is just checking whether the responses are all the same or not, it isn't considering the contents of the response.

"15030300020214" is the TLS alert message indicating "bad record mac". "3435342054" is "454 T", where "454" is an SMTP authentication failed error. (I can not account for the "4" at the end of the longer message.

So, it seems that this server is always sending the same TLS response, but it sometimes adds an SMTP error message and sometimes does not. Perhaps it would be acceptable to parse the response and remove anything that comes after the TLS response before comparing, but I'm not certain. I would have to test the other servers mentioned to see whether something similar is happening with them.

drwetter / testssl.sh

Inconsistent / incorrect ROBOT attack results (2nd issue) #2083