allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.47k stars 1.11k forks source link

counter broken on excess of 1 billion lines #799

Closed tsauce closed 7 years ago

tsauce commented 7 years ago

version : GoAccess - 1.2 on centos 7.3.1611

on computing haproxy logs combined for 1 day, broke counter -969,826,843 total request + -977,850,740 valid requests

What's the upper limit for tabulating results?

Also is there anyway to increase the width of the html page ? I have long URLs

allinurl commented 7 years ago

Interesting, the process counter is using an unsigned long which should give you a 4,294,967,295 max. I'm wondering if it's specific to the HTML report or if the terminal output is also giving you these results...

Are you on a 64-bit machine? Also are those two the only odd counters? Would you be able to submit a screenshot? Thanks.

tsauce commented 7 years ago

on 64bit machine. But this day, there were a lot of hits though.

tsauce commented 7 years ago

screen shot 2017-06-07 at 12 44 53 pm

allinurl commented 7 years ago

Do you get the same output if you run it through the terminal? i.e., w/o -o report.html. As far as increasing the width of your HTML report, you can make use of a custom stylesheet, --html-custom-css=<path.css>, and add something like (assuming you are using the vertical layout):

div.container {
    width: 95%;
    margin-left: 65px;
}
tsauce commented 7 years ago

it would take hours to run, but i can try it again

Thanks for the custom-css file

allinurl commented 7 years ago

Actually you don't need to run it again, you can simply check the source code of the HTML file. Just look for: var json_data= and you will see "total_requests": and "valid_requests" one or two lines under. Please post them here. Thanks.

tsauce commented 7 years ago
{"date_time": "2017-06-07 17:14:55 +0000","total_requests": -969826843,"valid_requests": -977850740,"failed_requests": 8023897,"generation_time": 47034,"unique_visitors": 0,"unique_files": 13604509,"excluded_hits": 0,"unique_referrers": 0,"unique_not_found": 708,"unique_static_files": 48,"log_size": 0,"bandwidth": 682024649676,"log_path": ["STDIN"]},"requests": {"metadata": {"avgts": {"avg": 32385},"cumts": {"count": 107424634289000,"max": 735100888000,
allinurl commented 7 years ago

Got it. Can you verify how many requests does the log have? e.g., wc -l access.log.

tsauce commented 7 years ago

591,513,479 lines which is odd

allinurl commented 7 years ago

Are you loading persisted data? or are you using the on-disk storage in goaccess?

tsauce commented 7 years ago

no on disk, just straight zcat hourlybreakdown.*.gz files from zcat

tsauce commented 7 years ago

if it matters: I'm using --fifo-in=/tmp/ga2/in.1 --fifo-out=/tmp/ga2/out.1

/usr/bin/zcat /opt/log/haproxy/haproxy.log-`date +"\%Y\%m\%d" --date="1 day ago"`*.gz | /usr/bin/goaccess -p /etc/goaccess.conf --fifo-in=/tmp/ga2/in.1 --fifo-out=/tmp/gc2/out.1 > /opt/stats/haproxy.log-`date +"\%Y\%m\%d" --date="1 day ago"`.html -
allinurl commented 7 years ago

I'm guessing the 969,826,843 is just an overflowed number from the 591M hits. I'll try to reproduce this on my side and I'll post back as soon as I have some news.

tsauce commented 7 years ago

👍 this is fantastic on the analysis - i just wish it could use mother than 1 CPU at a time :) but that's wishful thinking other than using more than one fifo

tsauce commented 7 years ago

Gerardo, is there anyway to display all of the URLs instead of just the top 300? This can be done via the HTML file ?

allinurl commented 7 years ago

@tsauce Can you please attach your config file? Also, a couple of things. You mentioned you have long request lines (URLs), so you probably want to build goaccess with --with-getline.

  1. As far as multiple threads, #377 will address this. Still in the works.
  2. If you are not outputting real-time-html, there's no need for using fifo-in/out.
  3. The HTML only allows 366 items. Please take a look at the notes on the man page.
tsauce commented 7 years ago

interestingly the job ran for 6/7 and it computed correctly. The only difference was that this was a cron job, the 6/6 run was run in a screen session manually. I'll re-run this 6/6 job --with-getline. Thanks for the suggestion.

screen shot 2017-06-08 at 7 45 32 am

tsauce commented 7 years ago
time-format %H:%M:%S
date-format %d/%b/%Y
log-format %^  %^ %^:%^:%^ %^ %^[%^]: %h:%^ [%d:%t.%^] %^ %^/%^ %^/%^/%^/%^/%L %s %b %^ %^ %^ %^/%^/%^/%^/%^ %^/%^ %^ "%r"
# Ignore parsing and displaying the given panel.
#
ignore-panel VISITORS
#ignore-panel REQUESTS
#ignore-panel REQUESTS_STATIC
#ignore-panel NOT_FOUND
#ignore-panel HOSTS
ignore-panel OS
ignore-panel BROWSERS
ignore-panel VISIT_TIMES
#ignore-panel VIRTUAL_HOSTS
ignore-panel REFERRERS
#ignore-panel REFERRING_SITES
ignore-panel KEYPHRASES
#ignore-panel GEO_LOCATION
#ignore-panel STATUS_CODES
allinurl commented 7 years ago

So far I'm not able to reproduce this on v1.2 on 3.2.0-4-amd64. Are you still able to replicate it? If you are, are you getting the same output on the terminal output? Thanks.

tsauce commented 7 years ago

Gerardo, i'll try it out but thanks for trying

On Mon, Jun 12, 2017 at 8:36 PM, Gerardo O. notifications@github.com wrote:

So far I'm not able to reproduce this on v1.2 on 3.2.0-4-amd64. Are you still able to replicate it? If you are, are you getting the same output on the terminal output? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allinurl/goaccess/issues/799#issuecomment-307997267, or mute the thread https://github.com/notifications/unsubscribe-auth/ASzQXpmXRdI_xxGdIlSiJMpyEDi0imgxks5sDgPNgaJpZM4NzCRn .

tsauce commented 7 years ago

so i ran this for the subsequent day and it turned out to be ok. I think perhaps my dataset has issues.

On Tue, Jun 13, 2017 at 3:09 PM, harry tsauce@gmail.com wrote:

Gerardo, i'll try it out but thanks for trying

On Mon, Jun 12, 2017 at 8:36 PM, Gerardo O. notifications@github.com wrote:

So far I'm not able to reproduce this on v1.2 on 3.2.0-4-amd64. Are you still able to replicate it? If you are, are you getting the same output on the terminal output? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allinurl/goaccess/issues/799#issuecomment-307997267, or mute the thread https://github.com/notifications/unsubscribe-auth/ASzQXpmXRdI_xxGdIlSiJMpyEDi0imgxks5sDgPNgaJpZM4NzCRn .

allinurl commented 7 years ago

@tsauce Thanks for the update; If it happens again, feel free to open or reopen this issue and I can look further.