Closed ungeahnt closed 1 year ago
I've just checked my logs. The serpstatbot doesn't show up any more since the BadBotBlocker was updated.
In my case the bots from the following domains are in the top 3.
You will still see the requests in your apache logs. (There is still a response - but a different response).
I just visited your site, and used my browser's dev tools to change the user-agent header to include "serpstatbot".
As you can see, the response changed from 200 to 406.
So, I think the bad-bot-blocker is working on your site...
Greg, many thanks for the efforts.
I was irritated due to the HTTP/1.0" 200 34 "-" "serpstatbot/2.1
in the server log files and that the log shows so many direct calls of different pages (every 5s, over 1-2 days). I assumed that the 200
is the server response (= ok) and that the request has been processed without blocking.
I have only now seen that the response is always 34 bytes (all day long). This indicates that no normal data is transmitted and only the same response is sent. This indicates that blocking works. But why does the server respond with 200
(according the log)?
Perhaps this is the answer (https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406):
In practice, this error is very rarely used. Instead of responding using this error code, which would be cryptic for the end user and difficult to fix, servers ignore the relevant header and serve an actual page to the user. It is assumed that even if the user won't be completely happy, they will prefer this to an error code.
Wouldn't a crawler stop after a few pages if it got an invalid response (like 40x)? Search engines also have to preserve their resources. I also can not find your 406
response in my server logs.
Somehow I don't understand this, but I have blocked the IP via htaccess for now and I see no more serpstat. Maybe I'll have another look at it later. When I have time again.
It's odd that you get a 200 in your logs. You saw the 406 in my screendump.
However, your logs also show that the response was only 34 bytes long. So, this is presumably just a short error message - which is what we want.
I saw the 406
and I think the blocking is ok.
I've found your test from today in my logs and again there is only a 200
response and the 34 bytes. It seems that something is not right in my server settings:
- - - [10/Jan/2023:16:01:17 +0100] "GET /tree/proavitus/individual/P7/Karoline-Martha-Buhler HTTP/1.0" 200 34 "https://proavitus.de/tree/proavitus" "serpstatbot/2.1 (Macintosh; Intel Mac OS X 10.15; rv:108.0) Gecko/20100101 Firefox/108.0" proavitus.de
I've to contact my hoster how this is possible.
When I change the User_Agent to serpstat
and load the page, then I see a white page "Not acceptable". So the blocking is working.
But the response status is first an 406
for the 'page' and then a second one for the favicon, which is 200
. I guess in my server log only the last status is shown and therefore I only see the 200
.
Is it correct that the favicon is still sent, or should this also be blocked?
But the response status is first an 406 for the 'page' and then a second one for the favicon, which is 200. I guess in my server log only the last status is shown and therefore I only see the 200.
This is strange. If webtrees sends a 406, I would expect to see it in the logs. Here is an example from my own site:
157.90.209.76 - - [31/Dec/2022:03:19:41 +0000] "GET /sitemap HTTP/1.1" 406 34 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
There should be two log entries. One for the page and one for the favicon.
Is it correct that the favicon is still sent, or should this also be blocked?
Robots and crawlers do not fetch favicon files. Only your browser is doing this.
The favicon is sent by apache, not by webtrees. Since webtrees does not see this request, it cannot block it.
This is strange. If webtrees sends a 406, I would expect to see it in the logs.
No single 406 in my logs and only one line per request.
It then seems to be a problem with my server config and I close the issue again.
Originally posted by @fisharebest in https://github.com/fisharebest/webtrees/issues/4634#issuecomment-1336819547
With wt V2.1.15 I still see many serpstat calls in the Apache access log. Could it be that blocking for serpstat is not working yet?
In the BadBotBlocker.php on my server 'serpstat' is at least listed.
Some lines from my Apache accesslog: