allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.39k stars 1.11k forks source link

Consider opening files by mmapping a read only file into memory #212

Open NinnOgTonic opened 9 years ago

NinnOgTonic commented 9 years ago

By mmapping files into memory you can optimise the IO greatly in many cases.

I wonder if there is there any reason not to do this?

allinurl commented 9 years ago

I can certainly look into this, sounds like it could give a good performance boost.

aphorise commented 9 years ago

I'd express that there are more reasons to not do this or consider it - particularly as logs are not processed by sample or isolated discreet sets. Furthermore, logs - in most production and practical cases will inevitably exceed available memory limits of the machine (arguable time too beyond a certain entropy).

So some recent (retrospect) logs have demonstrated themselves exceeding 20``Gbytes & at even 25% memory requirement to process + 20``Gbytes RAM to store - we can already see the unreasonable memory / space requirements.

Also with fixed / deterministic matrices it can become more a waste to map to memory for a single process / pass / phase read. Am I correct in thinking that this is currently the case? & that logs are in fact parsed in linear order and read from start-to-end without repositioning the read for another read or re-read?

The only cases I can imagine this being helpful are perhaps in cases of subjective or none identified determinants of complex matrices that can only be extrapolated with deeper comparative re-reads / re-parsing of the log.

@allinurl is this particular request still a serious consideration for future releases before Q3 of 2016? - if not then IMO this ticket ought to be closed.

NinnOgTonic commented 9 years ago

This issue was simply a enhancement request which fit my use case where i have a large array of <2GB files which has to be parsed individually.

I believe that you might be right in you case that +20GB logs are not to be parsed this way, but perhaps this could be used below a given threshold if it is proving to be valuable in some use cases?

aphorise commented 9 years ago

So why not just store your log in memory as a mounted ramfs ? I am assuming you have the memory for it & are on a Linux / Unix OS?

NinnOgTonic commented 9 years ago

Great input, I will consider that for further development. Though I am not sure if this might be both simpler to have in goaccess and also might fit other use cases better.

allinurl commented 9 years ago

I implemented a prototype for this request a while back, however, for some reason, performance wasn't great. Though, I have to admit that I did not look into the details of it. As @aphorise mentioned, there are probably better ways of handling this, so it's not top priority right now.

As soon as I have a chance, I'll push a quick implementation of this to a different branch where you can test it and see if it's worth adding it as a build option.