Open NinnOgTonic opened 9 years ago
I can certainly look into this, sounds like it could give a good performance boost.
I'd express that there are more reasons to not do this or consider it - particularly as logs are not processed by sample or isolated discreet sets. Furthermore, logs - in most production and practical cases will inevitably exceed available memory limits of the machine (arguable time too beyond a certain entropy).
So some recent (retrospect) logs have demonstrated themselves exceeding 20``Gbytes
& at even 25% memory requirement to process + 20``Gbytes
RAM to store - we can already see the unreasonable memory / space requirements.
Also with fixed / deterministic matrices it can become more a waste to map to memory for a single process / pass / phase read. Am I correct in thinking that this is currently the case? & that logs are in fact parsed in linear order and read from start-to-end without repositioning the read for another read or re-read?
The only cases I can imagine this being helpful are perhaps in cases of subjective or none identified determinants of complex matrices that can only be extrapolated with deeper comparative re-reads / re-parsing of the log.
@allinurl is this particular request still a serious consideration for future releases before Q3 of 2016? - if not then IMO this ticket ought to be closed.
This issue was simply a enhancement request which fit my use case where i have a large array of <2GB files which has to be parsed individually.
I believe that you might be right in you case that +20GB logs are not to be parsed this way, but perhaps this could be used below a given threshold if it is proving to be valuable in some use cases?
So why not just store your log in memory as a mounted ramfs
? I am assuming you have the memory for it & are on a Linux / Unix OS?
Great input, I will consider that for further development. Though I am not sure if this might be both simpler to have in goaccess and also might fit other use cases better.
I implemented a prototype for this request a while back, however, for some reason, performance wasn't great. Though, I have to admit that I did not look into the details of it. As @aphorise mentioned, there are probably better ways of handling this, so it's not top priority right now.
As soon as I have a chance, I'll push a quick implementation of this to a different branch where you can test it and see if it's worth adding it as a build option.
By mmapping files into memory you can optimise the IO greatly in many cases.
I wonder if there is there any reason not to do this?