coreruleset / fake-bot-plugin

This is a plugin that brings blocking of bots faking User-Agent to CRS.
Apache License 2.0
9 stars 5 forks source link

Inverted block #12

Closed ics closed 2 years ago

ics commented 2 years ago

Fake Bot plugin blocks valid bots and allows fake bots.

Sample googlebot transaction:

{
  "transaction": {
    "client_ip": "66.249.64.219",
    "time_stamp": "Tue Jul 12 15:54:19 2022",
    "server_id": "9e7bc6878f15155f664887f5952d257c0d032745",
    "client_port": 61702,
    "host_ip": "255.255.255.255",
    "host_port": 443,
    "unique_id": "1657641259",
    "request": {
      "method": "GET",
      "http_version": 1.1,
      "uri": "/",
      "headers": {
        "Host": "redacted.com",
        "AMP-Cache-Transform": "google;v=\"1..8\"",
        "Connection": "keep-alive",
        "Accept": "text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8",
        "User-Agent": "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
        "From": "googlebot(at)googlebot.com",
        "Accept-Encoding": "gzip, deflate, br",
        "If-Modified-Since": "Tue, 24 May 2022 23:58:16 GMT"
      }
    },
    "response": {
      "body": "",
      "http_code": 403,
      "headers": {}
    },
    "producer": {
      "modsecurity": "ModSecurity v3.0.6 (FreeBSD)",
      "connector": "ModSecurity-nginx v1.0.3",
      "secrules_engine": "Enabled",
      "components": [
        "OWASP_CRS/4.0.0-rc1\""
      ]
    },
    "messages": [
      {
        "message": "Fake bot detected: ",
        "details": {
          "match": "Matched \"Operator `Pm' with parameter `applebot bingbot facebookbot facebookcatalog facebookexternalhit googlebot twitterbot' against variable `REQUEST_HEADERS:User-Agent' (Value: `Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chr (100 characters omitted)' )",
          "reference": "o153,9v265,200",
          "ruleId": "9504110",
          "file": "/usr/local/share/modsecurity-crs/plugins/fake-bot-after.conf",
          "lineNumber": "19",
          "data": "Matched Data: if-modified-since found within REQUEST_HEADERS:User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
          "severity": "2",
          "ver": "fake-bot-plugin/1.0.0",
          "rev": "",
          "tags": [
            "application-multi",
            "language-multi",
            "platform-multi",
            "attack-bot",
            "capec/1000/225/22/77/13",
            "PCI/6.5.10",
            "paranoia-level/1"
          ],
          "maturity": "0",
          "accuracy": "0"
        }
      }
    ]
  }
}

Sample fake transaction curl "https://redacted.com/" -H "User-Agent: asd googlebot asd":

{
  "transaction": {
    "client_ip": "1.2.3.4",
    "time_stamp": "Tue Jul 12 15:59:08 2022",
    "server_id": "9e7bc6878f15155f664887f5952d257c0d032745",
    "client_port": 64532,
    "host_ip": "255.255.255.255",
    "host_port": 443,
    "unique_id": "1657641548",
    "request": {
      "method": "GET",
      "http_version": 2,
      "uri": "/",
      "headers": {
        "host": "redacted.com",
        "accept": "*/*",
        "user-agent": "asd googlebot asd"
      }
    },
    "response": {
      "body": "redacted",
      "http_code": 200,
      "headers": {}
    },
    "producer": {
      "modsecurity": "ModSecurity v3.0.6 (FreeBSD)",
      "connector": "ModSecurity-nginx v1.0.3",
      "secrules_engine": "Enabled",
      "components": [
        "OWASP_CRS/4.0.0-rc1\""
      ]
    },
    "messages": [
      {
        "message": "Fake bot detected: Googlebot",
        "details": {
          "match": "Matched \"Operator `InspectFile' with parameter `fake-bot.lua' against variable `TX:0' (Value: `googlebot' )",
          "reference": "o0,9v69,9",
          "ruleId": "9504110",
          "file": "/usr/local/share/modsecurity-crs/plugins/fake-bot-after.conf",
          "lineNumber": "19",
          "data": "Matched Data: googlebot found within REQUEST_HEADERS:User-Agent: Googlebot",
          "severity": "2",
          "ver": "fake-bot-plugin/1.0.0",
          "rev": "",
          "tags": [
            "application-multi",
            "language-multi",
            "platform-multi",
            "attack-bot",
            "capec/1000/225/22/77/13",
            "PCI/6.5.10",
            "paranoia-level/1"
          ],
          "maturity": "0",
          "accuracy": "0"
        }
      }
    ]
  }
}

Using fb1381ffb37e5f2118896e4d4ca717b9efc1687f, modsecurity 3.0.6, modsecurity-nginx 1.0.2, nginx 1.22.0.

lifeforms commented 2 years ago

Thank you for your submission. This looks like a valid Google bot:

# host 66.249.64.219
219.64.249.66.in-addr.arpa domain name pointer crawl-66-249-64-219.googlebot.com.

# host crawl-66-249-64-219.googlebot.com
crawl-66-249-64-219.googlebot.com has address 66.249.64.219
dune73 commented 2 years ago

@azurit, are you getting notifications for this issue?

azurit commented 2 years ago

@dune73 Yes.

azurit commented 2 years ago

@ics Is DNS resolving working ok on that server? Have you tried with modsec 2.9?

ics commented 2 years ago

Yes, resolving works. Tested in lua context too:

# ldd /usr/local/lib/libmodsecurity.so | grep liblua
        liblua-5.4.so => /usr/local/lib/liblua-5.4.so (0x801584000)
# lua54 -l socket
Lua 5.4.2  Copyright (C) 1994-2020 Lua.org, PUC-Rio
> hosts = socket.dns.getnameinfo("66.249.64.219")
> function dump(o)
>>    if type(o) == 'table' then
>>       local s = '{ '
>>       for k,v in pairs(o) do
>>          if type(k) ~= 'number' then k = '"'..k..'"' end
>>          s = s .. '['..k..'] = ' .. dump(v) .. ','
>>       end
>>       return s .. '} '
>>    else
>>       return tostring(o)
>>    end
>> end
> dump(hosts)
{ [1] = crawl-66-249-64-219.googlebot.com,}
>

I didn't try 2.9.

azurit commented 2 years ago

Willing to do some debug?

ics commented 2 years ago

Willing to do some debug?

Sure. I've already tried runnning modsecurity with SecDebugLogLevel 9 and made 2 requests. Both returned 200. Access log:

66.249.70.27 - - [13/Jul/2022:22:18:16 +0000] "GET /aircraft/ALO3 HTTP/1.1" 200 2346 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-""-"
81.136.60.194 - - [13/Jul/2022:22:19:39 +0000] "GET /aircraft/ALO3 HTTP/2.0" 200 12294 "-" "asd googlebot asd" "-""-"

1st request is made using Google search console URL Inspection. 2nd curl -H

Debug log: out.log

Please let me know if something stands out and what else can I try.

azurit commented 2 years ago

There's only first request (from Googlebot) in debug log. I can see this there:

[1657750696] [/aircraft/ALO3] [4] (Rule: 9504110) Executing operator "Pm" with param "applebot bingbot linkedinbot facebookbot facebookcatalog facebookexternalhit googlebot twitterbot" against REQUEST_HEADERS:User-Agent.
[1657750696] [/aircraft/ALO3] [9] Target value: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" (Variable: REQUEST_HEADERS:User-Agent)
[1657750696] [/aircraft/ALO3] [7] Added pm match TX.0: googlebot
[1657750696] [/aircraft/ALO3] [9] Matched vars updated.
[1657750696] [/aircraft/ALO3] [4] Rule returned 1.
[1657750696] [/aircraft/ALO3] [4] Executing chained rule.
[1657750696] [/aircraft/ALO3] [4] (Rule: 0) Executing operator "InspectFile" with param "fake-bot.lua" against TX:0.
[1657750696] [/aircraft/ALO3] [9] Target value: "googlebot" (Variable: TX:0)
[1657750696] [/aircraft/ALO3] [1] googlebot
[1657750696] [/aircraft/ALO3] [9] Returning from lua script: 
[1657750696] [/aircraft/ALO3] [4] Rule returned 0.

Rule wasn't triggered and request was not blocked (at least not by Fake bot plugin). Also, in the access log you provided, return code was 200.

Can you provide debug log also for second request (curl)? Thank you!

Also, this is strange (from the transaction log above):

"match": "Matched \"Operator `Pm' with parameter `applebot bingbot facebookbot facebookcatalog facebookexternalhit googlebot twitterbot' against variable `REQUEST_HEADERS:User-Agent' (Value: `Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chr (100 characters omitted)' )",
"data": "Matched Data: if-modified-since found within REQUEST_HEADERS:User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",

It matched something which was not in the list and even not in the data (if-modified-since).

And here:

"user-agent": "asd googlebot asd"
"data": "Matched Data: googlebot found within REQUEST_HEADERS:User-Agent: Googlebot",

There's different value for user-agent in headers and from rule output.

Those transaction logs seems to be mixed or something.

ics commented 2 years ago

I must have mixed them up. An example of blocked googlebot: 1657674084.txt. It looks like it's blocked by something else:

"match": "Matched \"Operator `ValidateByteRange' with parameter `32,34,38,42-59,61,65-90,95,97-122' against variable         `REQUEST_HEADERS:From' (Value: `googlebot(at)googlebot.com' )",

The invalid bot requests made with curl are still going through. Here's a new curl request:

Request:

curl "https://doc8643.com/aircraft/BALL" -H "User-Agent: fake googlebot"

Access log:

81.136.60.194 - - [14/Jul/2022:21:21:44 +0000] "GET /aircraft/BALL HTTP/2.0" 200 10089 "-" "fake googlebot" "-""-"

Debug log: 1657833704.log

dune73 commented 2 years ago

@ics: Thanks for this new attempt, but the log provided confirms that the plugin works just fine:

[1657833704] [/aircraft/BALL] [9] Target value: "fake googlebot" (Variable: REQUEST_HEADERS:user-agent)
[1657833704] [/aircraft/BALL] [7] Added pm match TX.0: googlebot
[1657833704] [/aircraft/BALL] [9] Matched vars updated.
[1657833704] [/aircraft/BALL] [4] Rule returned 1.
[1657833704] [/aircraft/BALL] [4] Executing chained rule.
[1657833704] [/aircraft/BALL] [4] (Rule: 0) Executing operator "InspectFile" with param "fake-bot.lua" against TX:0.
[1657833704] [/aircraft/BALL] [9] Target value: "googlebot" (Variable: TX:0)
[1657833704] [/aircraft/BALL] [1] googlebot
[1657833704] [/aircraft/BALL] [9] Returning from lua script: Fake Bot Plugin: Detected fake Googlebot.

So it's likely your blocking settings are at miss. Anomaly threshold too high?

Also: Can we close this?

ics commented 2 years ago

Using default anomaly thresholds. Thanks for looking into it. I'll investigate further.