allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.22k stars 1.1k forks source link

Support reverse proxy setup for Apache logs #78

Closed LeeNX closed 8 years ago

LeeNX commented 10 years ago

Having a reverse proxy server in front of your web server takes load of the web by caching static content like images and so on, but apache would report the reverse proxy IP as the client IP.

If there was an option, that supported "X-Forwarded-For", which apache can record real client IP, then goaccess would be able to parse the logs and provide real data about the clients and not report proxy IP, messing up the reports.

Thinking that if goaccess go report traffic via proxy and detailed report on real IP clients quite neatly.

allinurl commented 10 years ago

I can add this to the to-do list.

I'm curious to see what's the apache LogFormat that you are using and perhaps if you can post a couple sample requests would be great. Also, are you using conditional logging based upon %{X-Forwarded-For}i?

LeeNX commented 10 years ago
;SiteFQDN VarnishIP ClientIP RemoteLogName RemoteUserName TimeRequestRecieved FirstLineOfRequest FinalStatus SizeOfResponceBytes Referer UserAgent TimeTaken2ServeRequest_ms LogFormat "%{Host}i %h X-FF=\"%{X-Forwarded-For}i\" %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %D" leenx

www.sitename.tld 123.123.123.123 X-FF="213.213.213.213" - - [04/Feb/2014:17:10:49 +0200] "GET /news/full-colour-0 HTTP/1.1" 200 15701 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 11698431
www.sitename.tld 123.123.123.123 X-FF="213.213.213.214" - - [04/Feb/2014:17:10:58 +0200] "GET /sites/blankButton.png HTTP/1.1" 200 122 "http://www.sitename.tld/stuff" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36" 735
allinurl commented 10 years ago

Thanks. I'll look into this.

allinurl commented 9 years ago

Question about this, are you looking to use the X-FF="213.213.213.213" as the host reported in goaccess? I'm thinking this should be doable using a custom log format.

LeeNX commented 9 years ago

@allinurl , you might be spot on, minor custom log format should do it, with one caveat, that some times the traffic might pass though multiple proxies and there might be more than one IP in the X-FF field, not sure how goaccess would be able to sort that. Else we could close this issue.

gxhllj commented 8 years ago

@allinurl @LeeNX I use nginx, and log_format like this :

 '$remote_addr - $remote_user [$time_local] "$request" '
 '$status $body_bytes_sent "$http_referer" '
 '"$http_user_agent" "$http_x_forwarded_for" "$request_time"';

I don't care $remote_addr, but I care $http_x_forwarded_for.

I use goaccess and log_format in goaccess like :

log-format %^ %^[%d:%t %^] "%r" %s %b "%R" "%u" "%h" "%T"

there might be more than one IP in the remote_addr field,

but the goaccess ignore this ,it doesn't analyse the request while the $http_x_forwarded_for has more than one ip.

allinurl commented 8 years ago

@gxhllj you should be able to parse the left-most IP (being the client IP). For instance,

log-format %^[%d:%t %^] "%r" %s %b "%R" "%u" "%h,%^"

For the following log:

127.0.0.1 - - [23/Aug/2013:14:01:26 +0100] "GET /sites/xxx.pt/files/imagecache/64x64/avatar/picture-156409.jpg HTTP/1.1" 200 1768 "http://xxx.pt/forum/pr-saldos-80" "Mozilla/5.0 (Windows NT 5.1; rv:22.0; Avant TriCore) Gecko/20130630 Firefox/22.0" "213.58.193.194, 213.58.193.194"
gxhllj commented 8 years ago

@allinurl it can be done when $http_x_forwarded_for has more than one ip. but it can't when word when $http_x_forwarded_for has only one ip.

in my access.log the both situation existed.just like this :+

123.58.175.149 - - [28/Nov/2015:00:00:07 +0800] "GET /api/mobile/index.php?version=163&charset=utf-8&module=space_thread&uid=570880 HTTP/1.1" 200 391 "-" "iPhone" "11.11.11.11, 22.22.22.2" "0.061" 
123.58.175.148 - - [28/Nov/2015:00:00:07 +0800] "GET /forum.php?mod=post&action=newthread&fid=367 HTTP/1.1" 200 3721 "http://ldxy.16163.com/forum-367-1.html" "Mozilla/5.0 (Linux; Android 4.4.4; LA2-S Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36" "33.33.33.33" "0.045" 
gxhllj commented 8 years ago

@allinurl Hello, I favor your goaccess and I want to use it in my web site to analyze the access.log of nginx. But i meet a problem , could you help me?

My site get request from cache server, not from users directly. So i dont't care the remote_addr, it’s alwayse the cache server ip . I care the $http_x_forwarded_for (we use php).

So the question come, $http_x_forwarded_for is more complex rather than remote_addr:

$http_x_forwarded_for may have more one ip include one ip. If it has more than one ip , it use comma as separator.

My log_format like this

'$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" "$request_time"';

And my access.log like this :

123.58.175.149 - - [28/Nov/2015:00:00:07 +0800] "GET /api/index.php?version=163" 200 391 "-" "iPhone" "11.11.11.11, 22.22.22.2" "0.061" 

123.58.175.148 - - [28/Nov/2015:00:00:07 +0800] "GET /forum.php?fid=367 HTTP/1.1" 200 3721 "-" "Mozilla/5.0 (Linux; Android 4.4.4; LA2-S Build/KTU84P)AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36" "33.33.33.33" "0.045" 

I have try these log_format in goaccess:

log-format %^ %^[%d:%t %^] "%r" %s %b "%R" "%u" "%h" "%T"

This may invalid the request when $http_x_forwarded_for has more than one ip.

log-format %^ %^[%d:%t %^] "%r" %s %b "%R" "%u" "%h,%^" "%T"

This may invalid the request when $http_x_forwarded_for has oney one ip.

Please forgive my poor English.

allinurl commented 8 years ago

@gxhllj Thanks for clarifying. Adding a specifier to the parser should be able to handle this. I'll look into it.

gxhllj commented 8 years ago

Thanks . If it's solved ,pleased tell me the solution. @allinurl

allinurl commented 8 years ago

@gxhllj will do.

hyperized commented 8 years ago

We are also using X-Forwarded-For headers and goaccess currently fails to parse them. Is there any input we can provide to have this ticket rolling again?

A head start a section from goaccess.conf:

time-format %H:%M:%S
date-format %d/%b/%Y
log-format "%{X-Forwarded-For}i" %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" # Modified NCSA extended/combined log format

Example log entries (redacted IPs for ticket purpose) mostly garbage, but valid use cases. We see:

unknown, 41.190.*.* - - [16/Aug/2016:00:17:26 +0200] "GET /wp-login.php HTTP/1.1" 404 14064 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1"
- - - [16/Aug/2016:00:17:28 +0200] "GET /server-status?auto HTTP/1.1" 200 412 "-" "Wget/1.15 (linux-gnu)"
151.80.*.* - - [16/Aug/2016:00:17:40 +0200] "GET /artikel/ HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; AhrefsBot/5.1; +http://ahrefs.com/robot/)"
10.10.*.*, 82.148.*.* - - [16/Aug/2016:03:29:18 +0200] "GET /blog HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"
allinurl commented 8 years ago

@hyperized if you have a fixed number of IPs then you should be able to parse it with the latest version of goaccess. Otherwise, it will be part of the new feature that I'm working on. I'll bump this up so I can get to it.

lillfredrik commented 8 years ago

@allinurl Any input how to parse it as you propose if you know the number of IP's (mostly always two)? I stumbled upon this problem as well as all my sites run behind CDN's, but for nginx.

Log line:

88.6.x.x, 185.43.x.x [13/Sep/2016:22:17:50 +0200] "GET /url HTTP/1.1" 200 1586 "http://www.example.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7"

thx

allinurl commented 8 years ago

@lillfredrik do you know which IP should be the remote host?

lillfredrik commented 8 years ago

@allinurl First IP is client, second is the proxy.

Log format set in goaccess.conf: log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"

Log format set in nginx:

log_format main '$http_x_forwarded_for [$time_local] '
                             '"$request" $status $body_bytes_sent '
                             '"$http_referer" "$http_user_agent"' ;
allinurl commented 8 years ago

@lillfredrik Please try the following:

goaccess -f log --log-format='%h, %^[%d:%t %^] "%r" %s %b "%R" "%u"' --date-format=%d/%b/%Y --time-format=%T
lillfredrik commented 8 years ago

@allinurl Thanks that works when there are explicitly 2 IP's, but breaks when there is only 1. There will always be a mix, guess regex or conditional parsing will be tricky or?

Edit: I fixed this at the CDN layer for now, basically stripping out any intermediate proxies in the forwarded for/true client ip header at the last proxy, so I only get one IP and not the proxies on the way. If anyone stumbles upon this problem until it is fixed and is using Akamai, let me know and I can share the details.

allinurl commented 8 years ago

OK, it's finally here!!

I've added the ability to parse reverse proxy logs that contain a variable number of IPs (1+). This adds support to parse the X-Forwarded-For field in a reverse proxy setup.

The way it works is by checking if the given log format has a character delimiter followed by a vertical pipe followed by a second character delimiter, i.e., ,|". The parser will use the first delimiter found when parsing the preceding specifier, e.g., %h,|"

If a vertical pipe is a character literal within the log format, then it has to be escaped using a backslash, e.g., %d \\| %t

EDIT:

As of Jan 25, 2017, the log format for the XFF field has changed. See #632 for more details. The log formats below are no longer relevant.


Examples:

@lillfredrik sample log

88.6.2.1, 185.43.4.4 [13/Sep/2016:22:17:50 +0200] "GET /url HTTP/1.1" 200 1586 "http://www.example.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7"
89.6.2.1 [14/Sep/2016:22:17:50 +0200] "GET /url HTTP/1.1" 200 1586 "http://www.example.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7"
90.6.2.1, 185.43.4.4, 10.10.10.2 [15/Sep/2016:22:17:50 +0200] "GET /url HTTP/1.1" 200 1586 "http://www.example.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7"

Log format

%h,| %^[%d:%t %^] "%r" %s %b "%R" "%u"

@gxhllj sample log

123.58.175.149 - - [28/Nov/2015:00:00:07 +0800] "GET /api/mobile/index.php?version=163&charset=utf-8&module=space_thread&uid=570880 HTTP/1.1" 200 391 "-" "iPhone" "11.11.11.11, 22.22.22.2, 33.33.33.3" "0.061" 
123.58.175.148 - - [28/Nov/2015:00:00:07 +0800] "GET /forum.php?mod=post&action=newthread&fid=367 HTTP/1.1" 200 3721 "http://ldxy.16163.com/forum-367-1.html" "Mozilla/5.0 (Linux; Android 4.4.4; LA2-S Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36" "33.33.33.33" "0.045"
123.58.175.147 - - [28/Nov/2015:00:00:07 +0800] "GET /forum.php?mod=post&action=newthread&fid=367 HTTP/1.1" 200 3721 "http://ldxy.16163.com/forum-367-1.html" "Mozilla/5.0 (Linux; Android 4.4.4; LA2-S Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36" "::1, 33.33.33.33" "0.045"

Log format

%^ %^[%d:%t %^] "%r" %s %b "%R" "%u" "%h,|" "%T"

@LeeNX sample log

www.sitename.com 123.123.123.123 X-FF="213.213.213.213 90.12.12.32" - - [04/Feb/2014:17:10:49 +0200] "GET /news/full-colour-0 HTTP/1.1" 200 15701 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 11698431
www.sitename.com 123.123.123.123 X-FF="213.213.213.214" - - [04/Feb/2014:17:10:58 +0200] "GET /sites/blankButton.png HTTP/1.1" 200 122 "http://www.sitename.tld/stuff" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36" 735
www.sitename.com 123.123.123.123 X-FF="::1 10.10.10.10 20.20.20.20" - - [04/Feb/2014:17:10:58 +0200] "GET /sites/blankButton.png HTTP/1.1" 200 122 "http://www.sitename.tld/stuff" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36" 735

Log format

%v %^"%h |" %^[%d:%t %^] "%r" %s %b "%R" "%u" %D

@hyperized sample log

151.80.1.1 - - [16/Aug/2016:00:17:40 +0200] "GET /artikel/ HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; AhrefsBot/5.1; +http://ahrefs.com/robot/)"
10.10.3.2, 82.148.1.2 - - [16/Aug/2016:03:29:18 +0200] "GET /blog HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"

Log format

%h,| %^[%d:%t %^] "%r" %s %b "%R" "%u"

Feel free to build from development and give it a shot.

LeeNX commented 7 years ago

@allinurl - Thanks! Updated my goaccessrc, and currently parsing logs with X-Forwarded-For included.

But how does this influence the report? Is there a X-Forwarded-For section, or do you link/chain the Visitors count?

Again, awesome, and thanks for all the great work!!

allinurl commented 7 years ago

@LeeNX Glad to hear you are able to parse logs with X-Forwarded-For included.

It does not influence the count. It just uses the ip specified in the log-format. Numbers should be the same.

LeeNX commented 7 years ago

Just trying to understand how it's used. It replaces the %h IP, with the first IP in the X-Forward-For header/log field, is that correct?

Not sure what I could do with the X-Forward-For field data, so I can't really say how useful it could be?

Thanks!

allinurl commented 7 years ago

That's right, it sets %h to whatever the client's ip is. So if you have something like X-Forwarded-For: client, proxy1, proxy2, then you can let goaccess know which IP the client's ip is (regardless the position) and that would be the host shown in the report. It ignores all other IPs.

allinurl commented 7 years ago

Note:

The log format for the XFF field has changed. See #632 for more details. It will be deployed in the upcoming version.