allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.6k stars 1.11k forks source link

Real-time parsing works as not intended v.1.9.1 #2630

Open H1r0Sh1mA opened 9 months ago

H1r0Sh1mA commented 9 months ago

GoAccess ver 1.9.1 Env: docker container Command:

command: >
      /app/logs/access.log
      -o /app/report/access.log.html
      --real-time-html
      --origin=https://my.domain
      --ws-url=wss://my.domain/ws
      --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"'
      --keep-last=30
      --time-format='%T'
      --date-format='%d/%b/%Y'
      --db-path=/app/internal_db
      --geoip-database=/app/geo_db/GeoIP2-Country.mmdb
      --geoip-database=/app/geo_db/GeoLite2-ASN.mmdb
      --ignore-panel=OS
      --ignore-panel=BROWSERS
      --persist
      --restore

Problem: During the real-time parsing goaccess re-parsing the entire log file every time a new record is added, causing it to count all the requests and stats again, including those that were already counted in previous parsing iteration.

Describing in more details: we have 1 record in access log (total 1 line) -> report returns 1 total/valid request (all stats seems fine) adding 1 record to access log (total 2 lines) -> report returns 3 total/valid/ requests (goaccess read the log file from the beginning and adding the 1st and the 2d records to the previous iteration) adding 1 record to access log (total 3 lines) -> report returns 6 total/valid requests (goaccess read the log file from the beginning again and adding all the 3 records to previous iteration) etc.....

The one shot (no real-time monitoring) execution works as intended.

allinurl commented 9 months ago

Does this occur when you execute it directly on the machine, without using Docker?

H1r0Sh1mA commented 9 months ago

Does this occur when you execute it directly on the machine, without using Docker?

Standalone goaccess working as intended (with minimum workload).

Got some updates:

In my initial question i didnt mention one important detail - my access log rotating every 1 hour. Its inode number does not changing Here are rotation options:

/var/log/docker/*.log {
        hourly
        rotate 336
        copytruncate
        dateext
        datehourago
        dateformat _%Y-%m-%d_%H%M%S
        extension .log
        missingok
        compress
        notifempty
        sharedscripts
        postrotate
                docker exec `docker ps -q -f name=nginx_container` nginx -s reload > /dev/null 2>/dev/null || true
        endscript
}

The 1st hour after goaccess started im getting an accurate stats.

At the moment the second hour ticks and the access.log has been rotated (access.log inode value is still the same) - then the deviations started to appears. Some of them: In Overall Analyzed Requests pane

  1. Unique Visitors counter stops count
  2. Total\Valid requests starting to count with huge deviation (~10x )
  3. Tx Amount starting to count with huge deviation
  4. Requested Files counter stops count
  5. Log Size is correct

On the Time Distribution pane

this stats received in the first 20 mins of a hour and they are NOT correct Just to know the correct values for the request count during the fresh hour: wc -l access.log returns 213618 total requests So im expecting to receive ~ 1,063,131 total requests instead of 6,981,924 in goaccess dashboard

allinurl commented 9 months ago

Are you running this using tail -F? Have you tried passing the log directly, for example?

# goaccess access.log -o /reports/access.log.html --real-time-html --origin=https://my.domain/ --ws-url=wss://my.domain/ws --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"' --time-format='%T' --date-format='%d/%b/%Y' --db-path=/internal_db --geoip-database=/geo_db/GeoIP2-Country.mmdb --persist --restore
H1r0Sh1mA commented 9 months ago

Are you running this using tail -F? Have you tried passing the log directly, for example?

# goaccess access.log -o /reports/access.log.html --real-time-html --origin=https://my.domain/ --ws-url=wss://my.domain/ws --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"' --time-format='%T' --date-format='%d/%b/%Y' --db-path=/internal_db --geoip-database=/geo_db/GeoIP2-Country.mmdb --persist --restore

in my docker container im running goaccess like this (as stated in my docker-compose file):

command: >
      /app/logs/access.log
      -o /app/report/access.log.html
      --real-time-html
      --origin=https://my.domain
      --ws-url=wss://my.domain/ws
      --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"'
      --keep-last=30
      --time-format='%T'
      --date-format='%d/%b/%Y'
      --db-path=/app/internal_db
      --geoip-database=/app/geo_db/GeoIP2-Country.mmdb
      --geoip-database=/app/geo_db/GeoLite2-ASN.mmdb
      --ignore-panel=OS
      --ignore-panel=BROWSERS
      --persist
      --restore

so the log goes directly to goaccess without piping

allinurl commented 9 months ago

does it function properly outside of Docker when passing the log directly?

H1r0Sh1mA commented 9 months ago

does it function properly outside of Docker when passing the log directly?

Yep, standalone goaccess (outside Docker env) working as intended (with minimum workload).

allinurl commented 9 months ago

Got it. I'm just trying to pinpoint the problem as precisely as possible. So, it seems the issue arises when running via Docker or when piping data in using tail -F, right?

If you remove --persist and --restore, does the issue still occur with the real-time counters? Also, are you experiencing the same issues with terminal output? Any additional details would be appreciated so I can attempt to reproduce this on my end. Thanks!

H1r0Sh1mA commented 9 months ago

The issue arises when running via Docker. I did not use tail -F as i read its more advisable to send logs direct to goaccess to avoid potential issues. I will try to remove --persist and --restore options and report as soon as possible. I will also try to check the TUI behavior.

I can provide you with the docker-compose file im using.

Thank you so much for your time.

H1r0Sh1mA commented 9 months ago

Update removing of --persist and --restore options does not help. Here is the docker-compose file

  parser:
    image: allinurl/goaccess
    logging:
      driver: syslog
      options:
        tag: "goaccess_parser"
    expose:
      - "7890"
    command: >
      /app/logs/access.log
      -o /app/report/acces.log.html
      --jobs=2
      --real-time-html
      --origin=https://my.domain
      --ws-url=wss://my.domain/ws
      --log-format='%^ %^ %^ %^ %^ %v %h %^ %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\" \"%^\" \"%^\" \"%T\"'
      --keep-last=30
      --time-format=%H:%M:%S
      --date-format=%d/%b/%Y
      --db-path=/app/internal_db
      --geoip-database=/app/geo_db/GeoIP2-Country.mmdb
      --geoip-database=/app/geo_db/GeoLite2-ASN.mmdb
      --ignore-panel=OS
      --ignore-panel=BROWSERS
    volumes:
      - ./logs:/app/logs:ro
      - ./reports:/app/report
      - ./parser/internal_db:/app/internal_db
      - ./parser/geo_db:/app/geo_db

Unique Visitors Per Day Pane (wrong values): image

CUM. T.S\MAX. T.S values looking odd to me

Time Distribution Pane (15h is ok, 16h is wrong. 16h values received in 11 minutes) image

H1r0Sh1mA commented 9 months ago

Update

Time Distribution Pane:

Hours from different days overlap each other: Say - Yesterday 13th hour has 513k hits. When 13th hour ticks today, then its value starting to add to yesterdays 13th hour (513k for yesterday + XXXk for today)

allinurl commented 9 months ago

@H1r0Sh1mA, just to clarify, are you encountering two separate issues? One regarding the real-time problem with Docker and the other regarding the accuracy of the stats numbers? If so, I suggest opening a new issue for the stats so we can address them separately.

Regarding the discrepancy with the cum/max/avg time served, it seems like there might be an issue with your log format. Could you share the actual lines from your access log? That would help us investigate further. Thanks

H1r0Sh1mA commented 9 months ago

Could you share the actual lines from your access log? That would help us investigate further.

Feb 17 12:00:04 nginx_hostname nginx_container[13971]: my.domain 109.124.233.55 - - [17/Feb/2024:12:00:04 +0000] "POST /orders/event HTTP/1.1" 200 0 "-" "-" "0.001" "10.0.4.163:80" "0.002"

the log format arguments:

--log-format='%^ %^ %^ %^ %^ %v %h %^ %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\" \"%^\" \"%^\" \"%T\"'

escaping trails (like: \"%r\" ) are using due to the docker-compose format style. Now im issuing it like this:

command: [
"/app/logs/accesst.log",
"-o", "/app/report/access.html",
"--jobs=8",
"--keep-last=3",
"--real-time-html",
"--origin=https://my.domain",
"--ws-url=wss://my.domain/ws",
"--log-format='%^ %^ %^ %^ %^ %v %h %^ %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\" \"%^\" \"%^\" \"%T\"'",
"--time-format=%H:%M:%S",
"--date-format=%d/%b/%Y",
"--date-spec=min",
"--db-path=/app/internal_db",
"--geoip-database=/app/geo_db/GeoIP2-Country.mmdb",
"--geoip-database=/app/geo_db/GeoLite2-ASN.mmdb",
"--ignore-panel=OS",
"--ignore-panel=BROWSERS",
"--persist",
"--restore"
]

Didnt have a chance yet to test --datetime-format='%d/%b/%Y:%H:%M:%S %z' instead of --time-format=%H:%M:%S and --date-format=%d/%b/%Y

just to clarify, are you encountering two separate issues? One regarding the real-time problem with Docker and the other regarding the accuracy of the stats numbers?

I am experiencing all these issues in docker environment with real-time parsing mode. That why i have put them all here.

To avoid confusion i will open issues one by one separately.

H1r0Sh1mA commented 8 months ago

no updates so far?

allinurl commented 8 months ago

Some questions here...

  1. It seems like there might be an issue with your logrotate configuration. Can you verify how it's performing?

  2. Can you also share a simple way to reproduce this issue? I need the most basic command that triggers this issue so that we can narrow down the parameters that might not be causing it. e.g.,

 /app/logs/access.log
      -o /app/report/access.log.html
      --real-time-html
      --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"'
      --time-format='%T'
      --date-format='%d/%b/%Y'
  1. Also, have you tried running it as posted on the README.md and see if that also causes issues?
    tail -F access.log | docker run -p 7890:7890 --rm -i -e LANG=$LANG allinurl/goaccess -a -o html --log-format COMBINED --real-time-html - > report.html
  1. Can you replicate this issue on another machine that also performs log rotation?