allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.47k stars 1.11k forks source link

Ability to filter dataset by fields or regex #117

Open allinurl opened 10 years ago

allinurl commented 10 years ago

Add the ability to filter the results within the UI (Terminal & HTML) - e.g. filter by fields such as host, request, etc. then display only data matching that filter criteria, or enter a regex to match in the request and restrict display to only those matching entries.

Ideally this would spin up a new thread so multiple datasets can be analyzed at the same time. Each dataset should live on its own dashboard.

aphorise commented 9 years ago

Out of curiosity - is this functionality and those referenced / related intended for the TUI only?

allinurl commented 9 years ago

Good question, the original thought was to make these filters for the terminal, however, I didn't think much about having them available in the HTML output.

Not sure yet how this would work, perhaps allowing the user to set initial filters in the config file or since there are plans to have the HTML output be real-time, have some sort of subset filtering in the client side. Any thoughts?

aphorise commented 9 years ago

A small related note - I think most of the rapid requests that are coming in for additional functionality, aggregation and related UI - would be better grouped in a separate argument. For example:

--rui 'regex,average_files,average_hits,host_servers...'

for Rich-User-Interface. Thereafter and into the future it can be included as part of standard views if its common to most user expectations or perhaps adaptively enabled based on the log-file and the scheme therein that matches RUI options.

Regarding HTML - if you dont mind using jQuery & DataTables then for the specific purposes of sort / filter I'd recommend: http://datatables.net/examples/api/regex.html

Its a 160 Kbyte addition in javascript but worth it for what it does. This would also give us a footing into other light / efficient jQ based libraries for additional UI and eye-candy as required.

If however you do not wish to have such dependencies - then we have our work cut out :-D

gitanupam commented 7 years ago

Is there a way to see today which IP visited which pages? I know this ticket might help achieve that in the future , but until this feature is added, is there a workaround to do that today? BTW, thanks heaps for this tool.

allinurl commented 7 years ago

@gitanupam while this is implemented, grep or any other filtering tool would be your friend here. e.g.,

For real-time filtering:

# tail -f -n +0 access.log | grep --line-buffered '192.168.3.1' | goaccess --log-format=COMBINED -

or for static filtering, simply

# grep '192.168.3.1' access.log | goaccess --log-format=COMBINED -
rauschma commented 6 years ago

It’ll be great to have this! It would allow one to dig into a single day (to check how traffic varies during that day).

Dadibom commented 6 years ago

+1 for date/time range filtering

BirkhoffLee commented 6 years ago

+1

hiaselhans commented 5 years ago

+1

Shagon94 commented 5 years ago

not sure if this feature was deprioritized but would love to have this feature, seeing as this was opened 5 years ago I wanted to check if this is still being worked on?

allinurl commented 5 years ago

@Shagon94 certainly still on top of the list. however, there's an outstanding issue with the on-disk storage that needs to be worked on before this. stay tuned though.

dmaziuk commented 5 years ago

FWIW I'd stick the data into sqlite3 table(s) and use sql to do the filtering. Better yet, add a mini-abstraction-layer so people with huge amount of logs could use a grown-up sql engine (postgres) for storage too.

allinurl commented 5 years ago

@dmaziuk agree on that. Stay tuned for the upcoming storage change.

nkvname commented 5 years ago

Any updates regarding this? Would be already good, if there is a possibility to toggle between daily, weekly and monthly stats. Going back to AWStats hurts.

dmaziuk commented 5 years ago

awstats? Luxury! I'm thinking wrapping our analog plus report magic setup in a docker container...

allinurl commented 5 years ago

@nkvname need to finish the on-disk storage replacement, this is second on the list though.

domo84 commented 4 years ago

This would be a fantastic enhancement! Is there an issue for the on-disk storage replacement, if so, can you please link it?

allinurl commented 4 years ago

@domolicious I agree, this would be of great value! Issue #1274 Stay tuned!

insidesmart commented 4 years ago

@allinurl awesome work so far. excited about the new release.

Will the new release help us to do the following ?

Right now, the information is there but the contextual information is missing.

It would be useful to have all the fields in a drop-down to filter all tables in different contexts.

allinurl commented 4 years ago

@insidesmart This issue will add that feature, it won't be part of the upcoming release (v1.4). I've addressed the storage issue mentioned before, so after deploying 1.4 I can focus on this issue.

manix commented 4 years ago

+1

pstranghoener commented 4 years ago

@allinurl When will the time range filter function been released?

We could make a donation or sponsor that if it would help.

nagyalex commented 3 years ago

+1

I know the original intention was to create real-time tool, but adding filter support (GUI) would totally expand the possibilities for data analyses.

mitchross commented 2 years ago

I would like to see URLs being requested and the IP associated with each request.

air3ijai commented 2 years ago

How this may affect existing datasets and charts used for incremental parsing?

If at some point of time we will start to use this feature, will be existing datasets updated somehow and what will be shown on existing charts prior using this feature?

Now we are run GoAccess using the Docker and without specific version - docker run allinurl/goaccess.

allinurl commented 2 years ago

@air3ijai This is being implemented so that it works against currently parsed logs. This feature won't be enabled if a dataset was persisted and the log doesn't exist anymore.

air3ijai commented 2 years ago

@allinurl, you mean that if we will plan to enable such an option when it will be implemented it will require us to re-parse all the logs from scratch and if so, can we use it then with the next incremental parsing?

allinurl commented 2 years ago

Folks, I just wanted to give a positive and early update on this. I'm working on it as we speak. Early tests are already working as they should, though, still working on the details. I'm very excited of what it can achieve so far. Stay tuned!

infrastation commented 2 years ago

From all the comments it looks as if the design is still being decided. Here is a practical use case.

www.tcpdump.org has a download directory (/release/), which generates a signification fraction of the "tx amount" and hits/requests. There is also a man pages directory (/manpages/). It would be useful to be able to say: "let's look at the report without the downloads" or "let's look at the report for the downloads only" or "let's look at the report for the man pages only". Or, as mentioned earlier, to look at the IPv4/IPv6 subset only.

With the current implementation this could be done by filtering the access log with grep and generating a separate new report from stdin. As an idea, goaccess could allow to apply different filters post-generation in the viewer. From my point of view this would be useful enough and there would be no need to display multiple reports in the same viewer.

Feel free to use this input in your design if you wish.

CodeAlDente commented 2 years ago

With the current implementation this could be done by filtering the access log with grep and generating a separate new report from stdin. As an idea, goaccess could allow to apply different filters post-generation in the viewer. From my point of view this would be useful enough and there would be no need to display multiple reports in the same viewer.

I wouldn't want to have any kind of post-generation. Anything that manipulates the logfile should be avoided. That way we're able to apply upcoming filters / interests as-we-go on raw logfiles - and even combine them to get more insights. That's what I love about GoAccess, it's very flexible.

rwjack commented 1 year ago

Can't wait for this!

In my mind I see it as having a search bar in the html report, having the results filtered based on the search query.

eg. filtering by date - query: "date=12/12/22 && ip=1.2.3.4" or "daterange=today-15d && iprange=1.2.3.4/24"

Janevski commented 1 year ago

I would love for there to be like date selection and last week, last month, last year, all time - HTML selection filter. Because, after like a month or so, the data becomes too much data. Other than that, this goaccess tool is very, very, very nice. :+1:

t0mtaylor commented 1 year ago

Hope this feature adds the "hostname" to the html view also, when running multiple virtual hosts it would be good to see that in the table below the graph 👍

breck7 commented 1 year ago

This will be so great when it happens!!

vezaynk commented 1 year ago

Is there an active PR?

allinurl commented 1 year ago

@vezaynk There's is not. Development is coming along. Stay tuned!

kyberorg commented 1 year ago

I am also waiting for this. It would be great to have it.

imclean557 commented 1 year ago

@kyberorg you can help if you're that keen.

vezaynk commented 1 year ago

@imclean557 I'm sure there's plenty of people willing to help if there was an active branch. Your comment is most unhelpful.

nietzscheanic commented 1 year ago

+1

kuon commented 11 months ago

Just to add some usage context and workaround on this.

I have a cluster of web servers, and I run goaccess like this on my central syslog-ng machine:

goaccess /var/log/hosts/*/nginx/*.log \
  --log-format='%^:%^:%^:%^: %v %h %^[%d:%t %^] "%r" %s %b %L "%R" "%u"' \
  --date-format=%d/%b/%Y --time-format=%T --persist --restore \
  --db-path /var/goaccess/db -o /var/goaccess/www/index.html \
  -o /var/goaccess/www/report.json

This aggregate all requests of the cluster and produce one global report. The current issue would need to be implemented to be able to select which vhost to see in the main report.

As a workaround, I create "per-vhost" logs like this:


VHOSTS="vods.kuon.ch www.kuon.ch"

for f in /var/log/hosts/*/nginx/*.log
do
  for vhost in $VHOSTS
  do
    # Create destination directory
    host=$(basename $(dirname $(dirname $f)))
    outdir=/var/goaccess/vhosts/$vhost/$host
    mkdir -p $outdir
    out=$outdir/$(basename $f)

    # Filter logs
    # NOTE: $vhost will be matched as regex, you may need escaping
    rg "^\w+ \d+ \d+:\d+:\d+ \S+ \S+ \S+ access: $vhost " $f > $out

  done
done

# Remove empty logfiles
find /var/goaccess/vhosts -size 0 -delete

# Remove empty dirs
find /var/goaccess/vhosts -type d -empty -delete

for vhost in $VHOSTS
do
  db=/var/goaccess/db_vhosts/$vhost
  mkdir -p $db
  out=/var/goaccess/www/$vhost/
  mkdir -p $out
  goaccess /var/goaccess/vhosts/$vhost/*/*.log \
   --log-format='%^:%^:%^:%^: %v %h %^[%d:%t %^] "%r" %s %b %L "%R" "%u"' \
   --date-format=%d/%b/%Y --time-format=%T --persist --restore \
   --db-path $db -o $out/index.html \
   -o $out/report.json
done

It is a bit "quick & dirty" but it works for the time being.

rwjack commented 11 months ago

To anyone still following this thread, I found it way easier to just use promtail and grafana, instead of reinventing the wheel with goaccess log parsing, storage, etc.

kuon commented 11 months ago

To anyone still following this thread, I found it way easier to just use promtail and grafana, instead of reinventing the wheel with goaccess log parsing, storage, etc.

I beg to differ. I had a setup with grafana and loki but it was very hard to get some particular insight.

Sure, you can have one very nice panel with the stats, that you can look at, but it doesn't really tell you anything. With goaccess, when there is an issue I can just grep (rg) the logs and get the info I want.

Also, I switched to syslog-ng and it is so much better and easier than all new fancy solutions like promtail. Don't get me wrong, I get why all those solutions exists (having to route logs through the internet, better scalability...), but for our use at our size, plain log files are just easier.

I don't think those tools are exclusive. You can use goaccess to generate a .json and inject those metrics in prometheus or other to browse them in grafana.

Finally this kind of setup depends on many things, the number of servers, the criticality of the mission, the size of the team, the skills of the team... I can only advice on trying what fits your situation best.

nodiscc commented 9 months ago

My use case is described in https://github.com/allinurl/goaccess/issues/2599 (I persist the database on-disk, so there is currently no way to remove a visitor from the Visitor Hostnames and IPs table, as exclude-ip will only prevent new visitors from being inserted in the persistent database, but the ones already inserted will be kept).

I understand that this issue is trying to be "generic" (i.e. being able to filter based on any field), and real-time (i.e. ability to set a filter from TUI, command-line, or HTML report) - but I feel this the scope is too wide to actually be actionable/possible to implement (need to write different filter mechanisms for the TUI/CLI/HTML interfaces...)

@allinurl I think it would be good to establish a list of what users actually expect to achieve with this feature.

For me, a simple --exclude-ip-from-report $IP1,$IP2,$IP3,... command-line flag during one-shot HTML report generation would be sufficient.

Hufschmidt commented 9 months ago

Its been a while since I started monitoring this but I think my main requirement was also to exclude certain fixed IPs from my monitoring host from appearing on the list.

allinurl commented 9 months ago

@nodiscc, good observation. As I previously explained, it's currently not practical to implement a direct "exclude-ip-from-report" functionality when retrieving data from the persisted store because, at that stage, the data has already been processed. To introduce this feature, we need to restructure how data is stored, making it a bit more complex than a straightforward filtering process. Although there are challenges, progress is being made and will be out sooner than later.

@Hufschmidt, you can achieve exclusion using -e, for example: -e 127.0.0.1. Are you aiming to exclude in real-time?

bear0330 commented 9 months ago

Wait for a date filter for a long time

allinurl commented 9 months ago

@bear0330 hard at work on this feature! wait won't be in vain ;)

seiz commented 3 months ago

I am also eagerly awaiting this enhancement. I wish i could filter the HTML-Report at least by date (range), to be able to show stats for a specific date (i use persistence and my reports include several days/months).