Closed luchaos closed 4 years ago
Additionally, as soon as the scrambling starts, there are errors in the browser's console:
Error: <g> attribute transform: Expected number, "translate(NaN,0)". o @ prod-connect-integration.html:1
After a complete reset and running goaccess restore/persist for each gunzipped log file, there are no errors, no scrambled browsers nor OSs. Turns out they are still there, just not as many.
Thanks for sharing this. Interesting, just to confirm, you are not running it in real-time? Are you getting percentages over 100 for single-level panels (i.e. no data within a particular item - requests, time distribution, etc)?
Correct - those are all "static", based on incremental reads, no real-time.
There are other panels within those 5 reports where the totals don't match as can be seen in the screenshots/recordings below.
The labels seem to be referring to duplicate series within the charts. Some static files are in the list multiple times, the percentages too low to ever total to 100%. This happens in the Requested Files and Static Requests panels - not in the Browsers or OS panels (data is still scrambled/duplicated there).
I noticed that the date range in the upper right corner will not be shown if the Time Distribution panel has not been enabled. Might this be related to an unfortunate combination of panels which dictate how data is aggregated in the reports? E.g. the Hosts panel and HTTP Protocol have been disabled in all of them - does it miss particular dimensions to make up a clean data set because of that?
Just to be sure, I ran a restore (with and without all the previously disabled panels enabled):
goaccess --config-file=/srv/stats/conf/goaccess.conf --browsers-file=/srv/stats/conf/browsers/web.list --db-path=/srv/stats/storage/data/prod/web/ /var/log/nginx/web/access.log --restore --enable-panel REQUESTS_STATIC --enable-panel NOT_FOUND --enable-panel HOSTS --enable-panel REFERRERS --enable-panel REFERRING_SITES --enable-panel KEYPHRASES
After removing all db data, reading/persisting each single gunzipped file in descending order from the oldest to the newest, the browser errors that used to come up right after opening the page are gone but opening a Browsers' dropdown shows duplicates and the errors again. Some of the panels show more accurrate data, others more inaccurate (e.g. HTTP Status codes is at 132%).
The duplicates persist regardless of html or terminal output. So i assume the scrambling happened in the database after all.
@luchaos Did you find a solution to this problem in the meantime?
I have the same issue. Happened to me twice in the last few days since I started using GoAccess about a week ago. I'm using --restore
+ --persist
for incremental updates. At some point browsers, OS and referer URLs are scrambled up (maybe some other things too, but these are the obvious ones). They show duplicates, browsers/OS are shown under the wrong category (e.g. Firefox user agents are shown under Chrome).
I started the report by reading in all the archived server logs. Then I have a cronjob running every hour to update the HTML report by reading from the current server log file.
The command I'm using for incremental update looks like this:
goaccess /var/log/nginx/example.org.access.log --persist --restore --db-path /var/www/usage-reports/example.org/ --ignore-referer=*example.org -o /var/www/usage-reports/example.org/index.html
Removing all the *.db files and re-reading all logs from scratch solves the issue for me. Then, incremental updates are working fine, until some point when the problem re-appears.
I've added logging now for the incremental updates, maybe I can see something from the log file the next time the issue occurs.
@scheb unfortunately not :/
I have no idea when exactly the scrambling happens - just as you said - it runs ok for some time after a reset. In the end I didn't need the reports that clean/precise, just wanted to get a rough idea, but I'd still love to have a clean dataset at some point.
I tried a few different variants to see if it's because of any combinations of parameters - to no avail. Has to be something about how the aggregation into the db files works - might be just a simple off-by-one error even. E.g. it happens in the time distribution chart, too, where two datapoints are duplicated (05, 06) while two are missing (07, 08):
I haven't found time yet to look at the code. Logging the incremental updates sounds good! I'd be highly interested in your findings :)
Well, the scrambling happened again, but my logs don't show anything other than successful parsings. I guess I'm storing a backlog of logs then and recreate the report from scratch every time.
Tengo el mismo inconveniente, cuando utilizo las opciones de --persist --restore
@jjgr86 @luchaos I deployed a fix that could potentially address this issue when using --persist
--restore
. Could you please try building from development and let me know if it fixes the issue on your end?
Please note that unfortunately you may need to parse your log from scratch as it was not persisting to disk a hash table that keeps track of some sequences so you may not be able to recover from your existing db files.
@allinurl solucione parte de mi archivo de configuración : /etc/goaccess/goaccess.conf
real-time-html true
daemonize true
log-file /var/log/apache2/4443.access.log
log-file /var/log/apache2/8181.access.log
log-file /var/log/apache2/80.access.log
output /var/www/report/index.html
db-path /tmp/goaccess
persist true
restore true
Pasos que me funcionaron para la solución
pkill goaccess
rm /var/www/report/index.html
rm /tmp/goaccess/*.db
comento en: /etc/goaccess/goaccess.conf
#daemonize true
#log-file /var/log/apache2/4443.access.log
#log-file /var/log/apache2/8181.access.log
#log-file /var/log/apache2/80.access.log
zcat 80.access.log-20201003.gz 8181.access.log-20201003.gz 4443.access.log-20201003.gz | goaccess -f 80.access.log-20201004 8181.access.log-20201004 4443.access.log-20201004 --persist -o /var/www/report/index.html -
descomento
daemonize true
log-file /var/log/apache2/4443.access.log
log-file /var/log/apache2/8181.access.log
log-file /var/log/apache2/80.access.log
output /var/www/report/index.html
ejecuto
goaccess
el error se generaba cuando ejecutaba --persist
y no comentaba los log-file
que tenia configurado , cuando volvía a ejecutar goaccess
duplicaba algunos registros
@jjgr86 thanks for posting these findings. Just to be sure, did you build from development?
@ jjgr86 gracias por publicar estos hallazgos. Solo para estar seguro, ¿se basó en el desarrollo?
no, lo hice con la que ya tenia instalada, desde el repositorio backports de Debian 10.6
@allinurl también me fije que deja de registrase la estadísticas al rotarse los log de apache2, así que modifique el archivo /etc/logrotate.d/apache2
#
agregando el comando pkill goaccess
en prerotate
y goacces
en postrotate
en el archivo de configuración de rotación de logs de apache2 /etc/logrotate.d/apache2
Quizás pertenezca a otro hilo pero dejo este comentario por aquí!!!
@jjgr86 Thanks and feel free to build from development, it will be very useful to know if the latest changes work fine on your end.
That issue with rotating logs has been fixed upstream. It will be pushed in the upcoming release. Stay tuned!
@jjgr86 @luchaos I deployed a fix that could potentially address this issue when using
--persist
--restore
. Could you please try building from development and let me know if it fixes the issue on your end?Please note that unfortunately you may need to parse your log from scratch as it was not persisting to disk a hash table that keeps track of some sequences so you may not be able to recover from your existing db files.
Thanks @allinurl ! Built goaccess from source and removed existing db data. I'll let it run for a few days and get back to you with results.
@luchaos looking forward to it :)
@allinurl tl;dr: Thank you! Works perfectly now.
Running the project's logrotate script, which runs 5 different --restore --persist
routines, several times manually and having it run twice at logrotate time so far resulted in no errors whatsoever. Even after mixing up those manual and automated runs there was no desync/scrambling, the aggregations are clean.
An error would have shown itself by now going by the previous version's results.
I'd consider this issue resolved - as soon as it's included in an upcoming release the latest. This fix made me happier than you might've assumed :) Thank you for this great tool!
@luchaos That's great to hear. Thanks for the update. I'll be pushing out the upcoming release in the next few days. Stay tuned.
Closing this, feel free to reopen it if needed.
GoAccess - 1.4.
It looks like data is getting scrambled after a few iterations of running goaccess incrementally. Maybe I'm reading the reports incorrectly, but should it be possible that the total is over 100%? In the screenshots below you can see that Chrome versions are attributed to the Unknown and Crawlers list, Android Nougat escaping the Android list, etc. It happens a lot for the user agents in the custom browsers list, too, so I assume that there is some mismatch happening at the lower level? Maybe it's just the HTML output?
I'm running multiple incremental report generations at nginx logrotate time (daily) to have custom user agents aggregated vertically depending on use case. The assumption has been that there is no other way to make a distinction between multiple applications within a single user agent string - as that's usually not how they are interpreted but relevant in my particular use-case's domain. The logs in question aren't even that large and usually the script completes fairly quickly. db-path is being overridden by the individual calls as can be seen here: