Closed 0bi-w6n-K3nobi closed 3 years ago
I found where the problem:
At procedure perform_tail_follow
in goaccess.c
...
See the following piece of code:
size2 = file_size (fn);
...
if (!fseeko (fp, *size1, SEEK_SET))
parse_tail_follow (fp);
fclose (fp);
*size1 = size2;
If the LOG file grows quickly, the parse_tail_follow
routine can be read beyond of size2
.
And in the next LOOP it will process again from position size2
onwards.
This it will process the same records more than once time.
The correct thing would be:
if (!fseeko (fp, *size1, SEEK_SET))
size3 = parse_tail_follow (fp);
fclose (fp);
*size1 += size3;
Since parse_tail_follow
routine must return the number of bytes read.
Tip: when implementing this, it could also be used to update the size in the pipe' case.
I found where 2nd one, when CPU is overload:
Look, in main
at goaccess.c
/* main processing event */
time (&start_proc);
if ((ret = parse_log (&glog, NULL, 0))) {
end_spinner ();
goto clean;
}
...
if (conf.process_and_exit) {
}
/* stdout */
else if (conf.output_stdout) {
standard_output ();
}
First, the program will process all LOGs, from the beginning to the EOF.
Right after executing standard_output
that will call process_html
.
At process_html
...
for (i = 0; i < conf.filenames_idx; ++i) {
if (conf.filenames[i][0] == '-' && conf.filenames[i][1] == '\0')
size1[i] = 0;
else
size1[i] = file_size (conf.filenames[i]);
}
This should be OK. But it's not. If you think about the condition of several large LOGs, which will bring the CPU to overload. It will take considerable time for the program to reach this point, from initial EOF, returned again at same LOG. Then the program will lose the piece of this LOG, corresponding to the waiting time for processing the others and return to this one, until this TAIL mechanism, which can be seen below:
while (1) {
if (conf.stop_processing)
break;
for (i = 0; i < conf.filenames_idx; ++i)
perform_tail_follow (&size1[i], conf.filenames[i]); /* 0.2 secs */
Here, I propose a simple and efficient solution. I tried to make the minimum intervention and in a simple and very obvious way. Please consider this patch. At least evaluate the result. I already tested it and put it into production since it is working correctly.
The Problem: Basically, the problem is caused by the large time interval between reading the LOG and checking its size. There is a loss between the points: after reading the LOGs, until EOF, and the TAIL routine; There is reprocessing between the points: take LOG size and LOG TAIL routine, which [read] advances beyond this size.
Solution: Minimize these time intervals and make the bytes counting [already read] more accurate.
Implementation:
Remove the local file variable size1
, in the various routines, replacing it with a global variable glob-> filesizes
;
Add a count of last bytes of the current LOG glob->size
.
PS:
This implementation is also useful for STDIN
/ PIPE
;
Needing to implement, and thus be more accurate in real-time, in the routine get_log_sizes
.
Thank you for reporting this and working out a patch. I'm looking into this. I'll post back asap.
I can merge this. Do you want me to commit the patch from my end, or are you able to submit a PR? Let me know. Thanks
You can apply, please just add the credits.
Would it be difficult for me to be a collaborator too? Or at least exchange some ideas about the project? Thanks.
Awesome, are you able to share your email or at least the github noreply email so I can add it to the commit for the proper credits.
By all means, I'm happy with any collaboration, please feel free to share any ideas either here or shoot me an email to hello [at] goaccess.io! Thank you, sir!
The pleasure is mine. My email is 71027865+0bi-w6n-K3nob [at] users.noreply.github.com
I have used the software extensively. I am deploying real-time in 4 large domains each rougly about 10 ~ 30 million HITS by day, in my company.
Yes. I will certainly share ideas and solutions. See you later.
Merged. Thanks again.
I would certainly be interested to hear more how goaccess is working for you and the load it's handling :)
Great. I would like share it with you too. And tell you about my company and my experiences.
So, where I write about it? At hello [at] goaccess.io ?
Yep, hello [at] goaccess.io is perfect!
@0bi-w6n-K3nobi I noticed there's an issue with the patch that was apply for this issue. The issue happens under perform_tail_follow()
, specifically line 807.
For instance, *size1 += glog->bytes;
will keep the count from the last iteration and add the new bytes, however, this won't work when the log gets truncated. So before I attempt to fix this, I'm trying to understand a bit more the need of *size1 += glog->bytes;
instead of simply assigning the new size. e.g., *size1 = size2;
. Please let me know if that makes sense. Thanks
Hello @0bi-w6n-K3nobi. I made some changes to the perform_tail_follow()
, could you please test it out and let me know how it goes. I want to make sure things are working fine. Thanks!
Hi @allinurl .
Sorry about no reply soon you. I was stay so busy in this week.
This patch was suggested/applied for solve the real-time problem with LOGs
.
The problem is exactly if you simply assign the new size *size1 = size2
and LOGs
is increasing sudden.
To better explain, look you the code bellow carefully:
size2 = file_size (fn); ** 1 **
...
if (!fseeko (fp, *size1, SEEK_SET))
parse_tail_follow (fp);
fclose (fp); ** 2 **
*size1 += glog->bytes;
In the time interval between the assignment of the value to size2
[ at point 1 ]
until the end of the execution of the routine pase_tail_follow
[ point 2 ]
the LOG
may grow far beyond size2
. [ We can call this one size3
].
In this case, the pase_tail_follow
routine will process records up to size3
, i.e. beyond of size2
.
And on the next run, it will then reprocess the records in the range between size2
andsize3
,
if you assign *size1 = size2
.
This is exactly what happened to me. My real-time processing had 20% more records than correct. Therefore, this patch did resolve the problem.
I hope I was clear. :)
Thank you sir!
I have some problems with real-time. All problems refer to the incorrect accounting hits. I'm using version 1.4 with MaxMindDB and readline.
Preview: I made several attempts to get around the problem, and only via pipe '-', the real-time behaves properly. I believe that the problem lies in the TAIL routine of each LOG.
Scope: I have 4 NGinX servers. Each has 14 logs for the site that I want to monitor, due to access to each subdomain. This is the project I need to implement.
1st attempt: I implemented GoAccess directly on 1 of the servers, over NGinX. And directly reading the LOGs without any type of filtering.
2nd attempt: I made a script to send the LOGs concatenated and sent via TAIL and SOCAT. I also introduced a filter via AWK and a wait for each portion of records. Thus generating a single LOG that grows slowly until synchronizing in real-time.