Closed ics closed 2 years ago
Thank you for your submission. This looks like a valid Google bot:
# host 66.249.64.219
219.64.249.66.in-addr.arpa domain name pointer crawl-66-249-64-219.googlebot.com.
# host crawl-66-249-64-219.googlebot.com
crawl-66-249-64-219.googlebot.com has address 66.249.64.219
@azurit, are you getting notifications for this issue?
@dune73 Yes.
@ics Is DNS resolving working ok on that server? Have you tried with modsec 2.9?
Yes, resolving works. Tested in lua context too:
# ldd /usr/local/lib/libmodsecurity.so | grep liblua
liblua-5.4.so => /usr/local/lib/liblua-5.4.so (0x801584000)
# lua54 -l socket
Lua 5.4.2 Copyright (C) 1994-2020 Lua.org, PUC-Rio
> hosts = socket.dns.getnameinfo("66.249.64.219")
> function dump(o)
>> if type(o) == 'table' then
>> local s = '{ '
>> for k,v in pairs(o) do
>> if type(k) ~= 'number' then k = '"'..k..'"' end
>> s = s .. '['..k..'] = ' .. dump(v) .. ','
>> end
>> return s .. '} '
>> else
>> return tostring(o)
>> end
>> end
> dump(hosts)
{ [1] = crawl-66-249-64-219.googlebot.com,}
>
I didn't try 2.9.
Willing to do some debug?
Willing to do some debug?
Sure.
I've already tried runnning modsecurity with SecDebugLogLevel 9
and made 2 requests. Both returned 200. Access log:
66.249.70.27 - - [13/Jul/2022:22:18:16 +0000] "GET /aircraft/ALO3 HTTP/1.1" 200 2346 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-""-"
81.136.60.194 - - [13/Jul/2022:22:19:39 +0000] "GET /aircraft/ALO3 HTTP/2.0" 200 12294 "-" "asd googlebot asd" "-""-"
1st request is made using Google search console URL Inspection. 2nd curl -H
Debug log: out.log
Please let me know if something stands out and what else can I try.
There's only first request (from Googlebot) in debug log. I can see this there:
[1657750696] [/aircraft/ALO3] [4] (Rule: 9504110) Executing operator "Pm" with param "applebot bingbot linkedinbot facebookbot facebookcatalog facebookexternalhit googlebot twitterbot" against REQUEST_HEADERS:User-Agent.
[1657750696] [/aircraft/ALO3] [9] Target value: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" (Variable: REQUEST_HEADERS:User-Agent)
[1657750696] [/aircraft/ALO3] [7] Added pm match TX.0: googlebot
[1657750696] [/aircraft/ALO3] [9] Matched vars updated.
[1657750696] [/aircraft/ALO3] [4] Rule returned 1.
[1657750696] [/aircraft/ALO3] [4] Executing chained rule.
[1657750696] [/aircraft/ALO3] [4] (Rule: 0) Executing operator "InspectFile" with param "fake-bot.lua" against TX:0.
[1657750696] [/aircraft/ALO3] [9] Target value: "googlebot" (Variable: TX:0)
[1657750696] [/aircraft/ALO3] [1] googlebot
[1657750696] [/aircraft/ALO3] [9] Returning from lua script:
[1657750696] [/aircraft/ALO3] [4] Rule returned 0.
Rule wasn't triggered and request was not blocked (at least not by Fake bot plugin). Also, in the access log you provided, return code was 200
.
Can you provide debug log also for second request (curl)? Thank you!
Also, this is strange (from the transaction log above):
"match": "Matched \"Operator `Pm' with parameter `applebot bingbot facebookbot facebookcatalog facebookexternalhit googlebot twitterbot' against variable `REQUEST_HEADERS:User-Agent' (Value: `Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chr (100 characters omitted)' )",
"data": "Matched Data: if-modified-since found within REQUEST_HEADERS:User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
It matched something which was not in the list and even not in the data (if-modified-since
).
And here:
"user-agent": "asd googlebot asd"
"data": "Matched Data: googlebot found within REQUEST_HEADERS:User-Agent: Googlebot",
There's different value for user-agent in headers and from rule output.
Those transaction logs seems to be mixed or something.
I must have mixed them up. An example of blocked googlebot: 1657674084.txt. It looks like it's blocked by something else:
"match": "Matched \"Operator `ValidateByteRange' with parameter `32,34,38,42-59,61,65-90,95,97-122' against variable `REQUEST_HEADERS:From' (Value: `googlebot(at)googlebot.com' )",
The invalid bot requests made with curl are still going through. Here's a new curl request:
Request:
curl "https://doc8643.com/aircraft/BALL" -H "User-Agent: fake googlebot"
Access log:
81.136.60.194 - - [14/Jul/2022:21:21:44 +0000] "GET /aircraft/BALL HTTP/2.0" 200 10089 "-" "fake googlebot" "-""-"
Debug log: 1657833704.log
@ics: Thanks for this new attempt, but the log provided confirms that the plugin works just fine:
[1657833704] [/aircraft/BALL] [9] Target value: "fake googlebot" (Variable: REQUEST_HEADERS:user-agent)
[1657833704] [/aircraft/BALL] [7] Added pm match TX.0: googlebot
[1657833704] [/aircraft/BALL] [9] Matched vars updated.
[1657833704] [/aircraft/BALL] [4] Rule returned 1.
[1657833704] [/aircraft/BALL] [4] Executing chained rule.
[1657833704] [/aircraft/BALL] [4] (Rule: 0) Executing operator "InspectFile" with param "fake-bot.lua" against TX:0.
[1657833704] [/aircraft/BALL] [9] Target value: "googlebot" (Variable: TX:0)
[1657833704] [/aircraft/BALL] [1] googlebot
[1657833704] [/aircraft/BALL] [9] Returning from lua script: Fake Bot Plugin: Detected fake Googlebot.
So it's likely your blocking settings are at miss. Anomaly threshold too high?
Also: Can we close this?
Using default anomaly thresholds. Thanks for looking into it. I'll investigate further.
Fake Bot plugin blocks valid bots and allows fake bots.
Sample googlebot transaction:
Sample fake transaction
curl "https://redacted.com/" -H "User-Agent: asd googlebot asd"
:Using fb1381ffb37e5f2118896e4d4ca717b9efc1687f, modsecurity 3.0.6, modsecurity-nginx 1.0.2, nginx 1.22.0.