allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
17.78k stars 1.09k forks source link

How to hide paths duplicated based on protocol #2659

Closed yagarea closed 2 months ago

yagarea commented 2 months ago

I generate HTML report and there are some files counted multiple times: image

As you can see that feed.xml and / are counted twice and have two separate stats.

Is this a bug or I did some mistake during generation ?

This is command I use:

goaccess --log-format=caddy \
           --html-report-title="Report" \
           --tz=Europe/Prague \
           --agent-list \
           --with-output-resolver \
           --ignore-crawlers \
           --real-os \
           --no-ip-validation \
           --log-format='{ "ts": "%x.%^", "request": { "remote_ip": "%h", "proto":"%H", "method": "%m", "host": "%v", "uri": "%U" }, "duration": "%T", "size"    : "%b","status": "%s" }' \
           --time-format="%s" \
           --date-format="%s" \
           --log-file="log_file" \
           --geoip-database=/var/geoip.mmdb \
           --output="out.html"
allinurl commented 2 months ago

GoAccess will analyze requests based on the various methods and protocols used. For example, a request to feed.xml might come through either a GET or POST method, and could be using either HTTP/1.0 or HTTP/2. If you're not seeing the method and protocol displayed in the panel option, it's possible that the format might be incorrect or you have disabled those columns from the panel options. Feel free to provide a few sample lines from your log if you need further assistance.

yagarea commented 2 months ago

This random sample from my log. I replace IP addresses with place holder:

{"level":"info","ts":1707393940.3565125,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1555","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/favicon.ico"},"duration":0.001563662,"size":104741,"status":200}
{"level":"info","ts":1707394009.254943,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"43946","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.181330715,"size":1400460,"status":200}
{"level":"info","ts":1707394083.3710835,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"58834","proto":"HTTP/1.1","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.001424409,"size":0,"status":304}
{"level":"info","ts":1707394480.6935828,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"4010","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/index.css"},"duration":0.001374868,"size":0,"status":304}
{"level":"info","ts":1707394480.6936617,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"4010","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/sidebar.css"},"duration":0.001443935,"size":0,"status":304}
{"level":"info","ts":1707394480.7263362,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"4010","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/fontawesome/icons.svg"},"duration":0.000397971,"size":0,"status":304}
{"level":"info","ts":1707394480.745779,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"4010","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/print.css"},"duration":0.000429884,"size":0,"status":304}
{"level":"info","ts":1707394896.4724221,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"17048","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/zaklady-astrofyziky/"},"duration":0.022938013,"size":4945,"status":200}
{"level":"info","ts":1707395029.9238915,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"4375","proto":"HTTP/1.1","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.00003235,"size":0,"status":308}
{"level":"info","ts":1707395030.1019588,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"4398","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.001534572,"size":0,"status":304}
{"level":"info","ts":1707395164.8172207,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/literarni-epochy-smery-proudy-a-hnuti/"},"duration":0.004965027,"size":10509,"status":200}
{"level":"info","ts":1707395165.2376797,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/index.css"},"duration":0.002757116,"size":2823,"status":200}
{"level":"info","ts":1707395165.2397337,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/sidebar.css"},"duration":0.000627287,"size":425,"status":200}
{"level":"info","ts":1707395165.290985,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/print.css"},"duration":0.000662513,"size":111,"status":200}
{"level":"info","ts":1707395165.293138,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/fontawesome/icons.svg"},"duration":0.002773176,"size":2267,"status":200}
{"level":"info","ts":1707395165.8441951,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/meta/logo.png"},"duration":0.00128878,"size":13795,"status":200}
{"level":"info","ts":1707395165.8509634,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/fonts/PTSans-Regular.woff"},"duration":0.002494945,"size":55868,"status":200}
{"level":"info","ts":1707395165.8514028,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"1931","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/fonts/PTSans-Bold.woff"},"duration":0.001381523,"size":56648,"status":200}
{"level":"info","ts":1707395221.895296,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"39004","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.076155649,"size":1400460,"status":200}
{"level":"info","ts":1707395267.9708216,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/literarni-epochy-smery-proudy-a-hnuti/"},"duration":0.003169087,"size":10509,"status":200}
{"level":"info","ts":1707395267.9718764,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/sidebar.css"},"duration":0.001554541,"size":425,"status":200}
{"level":"info","ts":1707395267.9727333,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/index.css"},"duration":0.002423909,"size":2823,"status":200}
{"level":"info","ts":1707395268.1202044,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/css/print.css"},"duration":0.000735573,"size":111,"status":200}
{"level":"info","ts":1707395268.1206877,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/fontawesome/icons.svg"},"duration":0.001209128,"size":2267,"status":200}
{"level":"info","ts":1707395268.32206,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/meta/logo.png"},"duration":0.001377513,"size":13795,"status":200}
{"level":"info","ts":1707395268.4311347,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/favicon.ico"},"duration":0.060851044,"size":104741,"status":200}
{"level":"info","ts":1707395334.0654433,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/fonts/PTSans-Regular.woff"},"duration":0.002669714,"size":55868,"status":200}
{"level":"info","ts":1707395334.0662339,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"47817","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/fonts/PTSans-Bold.woff"},"duration":0.002021236,"size":56648,"status":200}
{"level":"info","ts":1707395351.1507611,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"50460","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/img/physics/tepelne-deje-v-plynech/izobaricky.png"},"duration":0.016652077,"size":48374,"status":200}
{"level":"info","ts":1707395381.7478125,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"50475","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/img/physics/tepelne-deje-v-plynech/izobaricky.png"},"duration":0.002350928,"size":48374,"status":200}
{"level":"info","ts":1707395565.4941285,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"3314","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.001790055,"size":0,"status":304}
{"level":"info","ts":1707396432.6606312,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"49232","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.153709067,"size":1400460,"status":200}
{"level":"info","ts":1707396952.3530242,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/about/"},"duration":0.018412515,"size":2793,"status":200}
{"level":"info","ts":1707396952.681723,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/icons/email-icon.png"},"duration":0.037162497,"size":4468,"status":200}
{"level":"info","ts":1707396952.7016633,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/generated/assets/meta/me-1280-c15b07d76.webp"},"duration":0.056938878,"size":51556,"status":200}
{"level":"info","ts":1707396952.7169287,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/icons/github-icon.png"},"duration":0.067996455,"size":6263,"status":200}
{"level":"info","ts":1707396952.7317922,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/icons/bandcamp-icon.png"},"duration":0.082598761,"size":3885,"status":200}
{"level":"info","ts":1707396952.7431002,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/icons/matrix-icon.png"},"duration":0.09420187,"size":8869,"status":200}
{"level":"info","ts":1707396952.7585325,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/icons/telegram-icon.png"},"duration":0.109555341,"size":9169,"status":200}
{"level":"info","ts":1707396952.773725,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"2384","proto":"HTTP/2.0","method":"GET","host":"blackblog.cz","uri":"/assets/icons/stackoverflow-icon.png"},"duration":0.12458609,"size":4396,"status":200}
{"level":"info","ts":1707397112.9282725,"logger":"http.log.access.log7","msg":"handled request","request":{"remote_ip":"IP-ADDR","remote_port":"4002","proto":"HTTP/1.1","method":"GET","host":"blackblog.cz","uri":"/feed.xml"},"duration":0.00001639,"size":0,"status":308}
allinurl commented 2 months ago

Based on the lines you provided, it seems there are two instances of feed.xml being requested, each using a different protocol.

2024-04-10-090036_1049x344_scrot

yagarea commented 2 months ago

Is there a way to merge all requests based only on url to merge them ?

allinurl commented 2 months ago

please take a look at the parse options, the following should do it:

# goaccess access.log --log-format='{ "ts": "%x.%^", "request": { "remote_ip": "%h", "proto":"%H", "method": "%m", "host": "%v", "uri": "%U" }, "duration": "%T", "size"    : "%b","status": "%s" }' --datetime-format=%s -H no -M no --date-spec=min

2024-04-10-122144_508x128_scrot

yagarea commented 2 months ago

please take a look at the parse options, the following should do it:

# goaccess access.log --log-format='{ "ts": "%x.%^", "request": { "remote_ip": "%h", "proto":"%H", "method": "%m", "host": "%v", "uri": "%U" }, "duration": "%T", "size"    : "%b","status": "%s" }' --datetime-format=%s -H no -M no --date-spec=min

2024-04-10-122144_508x128_scrot

Thank you very much. This is exactly what I was looking for.

Your projects are awesome and your support is excellent. You are great developer and good person.

allinurl commented 2 months ago

@yagarea You're most welcome, and thank you very much for your kind words. Stay tuned for some exciting updates coming down the pipeline.