Open allinurl opened 10 years ago
Out of curiosity - is this functionality and those referenced / related intended for the TUI only?
Good question, the original thought was to make these filters for the terminal, however, I didn't think much about having them available in the HTML output.
Not sure yet how this would work, perhaps allowing the user to set initial filters in the config file or since there are plans to have the HTML output be real-time, have some sort of subset filtering in the client side. Any thoughts?
A small related note - I think most of the rapid requests that are coming in for additional functionality, aggregation and related UI - would be better grouped in a separate argument. For example:
--rui 'regex,average_files,average_hits,host_servers...'
for Rich-User-Interface. Thereafter and into the future it can be included as part of standard views if its common to most user expectations or perhaps adaptively enabled based on the log-file and the scheme therein that matches RUI options.
Regarding HTML - if you dont mind using jQuery & DataTables then for the specific purposes of sort / filter I'd recommend: http://datatables.net/examples/api/regex.html
Its a 160
Kbyte
addition in javascript
but worth it for what it does.
This would also give us a footing into other light / efficient jQ
based libraries for additional UI and eye-candy as required.
If however you do not wish to have such dependencies - then we have our work cut out :-D
Is there a way to see today which IP visited which pages? I know this ticket might help achieve that in the future , but until this feature is added, is there a workaround to do that today? BTW, thanks heaps for this tool.
@gitanupam while this is implemented, grep or any other filtering tool would be your friend here. e.g.,
For real-time filtering:
# tail -f -n +0 access.log | grep --line-buffered '192.168.3.1' | goaccess --log-format=COMBINED -
or for static filtering, simply
# grep '192.168.3.1' access.log | goaccess --log-format=COMBINED -
It’ll be great to have this! It would allow one to dig into a single day (to check how traffic varies during that day).
+1 for date/time range filtering
+1
+1
not sure if this feature was deprioritized but would love to have this feature, seeing as this was opened 5 years ago I wanted to check if this is still being worked on?
@Shagon94 certainly still on top of the list. however, there's an outstanding issue with the on-disk storage that needs to be worked on before this. stay tuned though.
FWIW I'd stick the data into sqlite3 table(s) and use sql to do the filtering. Better yet, add a mini-abstraction-layer so people with huge amount of logs could use a grown-up sql engine (postgres) for storage too.
@dmaziuk agree on that. Stay tuned for the upcoming storage change.
Any updates regarding this? Would be already good, if there is a possibility to toggle between daily, weekly and monthly stats. Going back to AWStats hurts.
awstats? Luxury! I'm thinking wrapping our analog plus report magic setup in a docker container...
@nkvname need to finish the on-disk storage replacement, this is second on the list though.
This would be a fantastic enhancement! Is there an issue for the on-disk storage replacement, if so, can you please link it?
@domolicious I agree, this would be of great value! Issue #1274 Stay tuned!
@allinurl awesome work so far. excited about the new release.
Will the new release help us to do the following ?
Right now, the information is there but the contextual information is missing.
It would be useful to have all the fields in a drop-down to filter all tables in different contexts.
@insidesmart This issue will add that feature, it won't be part of the upcoming release (v1.4). I've addressed the storage issue mentioned before, so after deploying 1.4 I can focus on this issue.
+1
@allinurl When will the time range filter function been released?
We could make a donation or sponsor that if it would help.
+1
I know the original intention was to create real-time tool, but adding filter support (GUI) would totally expand the possibilities for data analyses.
I would like to see URLs being requested and the IP associated with each request.
How this may affect existing datasets and charts used for incremental parsing?
If at some point of time we will start to use this feature, will be existing datasets updated somehow and what will be shown on existing charts prior using this feature?
Now we are run GoAccess using the Docker and without specific version - docker run allinurl/goaccess
.
@air3ijai This is being implemented so that it works against currently parsed logs. This feature won't be enabled if a dataset was persisted and the log doesn't exist anymore.
@allinurl, you mean that if we will plan to enable such an option when it will be implemented it will require us to re-parse all the logs from scratch and if so, can we use it then with the next incremental parsing?
Folks, I just wanted to give a positive and early update on this. I'm working on it as we speak. Early tests are already working as they should, though, still working on the details. I'm very excited of what it can achieve so far. Stay tuned!
From all the comments it looks as if the design is still being decided. Here is a practical use case.
www.tcpdump.org has a download directory (/release/
), which generates a signification fraction of the "tx amount" and hits/requests. There is also a man pages directory (/manpages/
). It would be useful to be able to say: "let's look at the report without the downloads" or "let's look at the report for the downloads only" or "let's look at the report for the man pages only". Or, as mentioned earlier, to look at the IPv4/IPv6 subset only.
With the current implementation this could be done by filtering the access log with grep and generating a separate new report from stdin. As an idea, goaccess could allow to apply different filters post-generation in the viewer. From my point of view this would be useful enough and there would be no need to display multiple reports in the same viewer.
Feel free to use this input in your design if you wish.
With the current implementation this could be done by filtering the access log with grep and generating a separate new report from stdin. As an idea, goaccess could allow to apply different filters post-generation in the viewer. From my point of view this would be useful enough and there would be no need to display multiple reports in the same viewer.
I wouldn't want to have any kind of post-generation. Anything that manipulates the logfile should be avoided. That way we're able to apply upcoming filters / interests as-we-go on raw logfiles - and even combine them to get more insights. That's what I love about GoAccess, it's very flexible.
Can't wait for this!
In my mind I see it as having a search bar in the html report, having the results filtered based on the search query.
eg. filtering by date - query: "date=12/12/22 && ip=1.2.3.4" or "daterange=today-15d && iprange=1.2.3.4/24"
I would love for there to be like date selection and last week, last month, last year, all time - HTML selection filter. Because, after like a month or so, the data becomes too much data. Other than that, this goaccess tool is very, very, very nice. :+1:
Hope this feature adds the "hostname" to the html view also, when running multiple virtual hosts it would be good to see that in the table below the graph 👍
This will be so great when it happens!!
Is there an active PR?
I am also waiting for this. It would be great to have it.
@kyberorg you can help if you're that keen.
@imclean557 I'm sure there's plenty of people willing to help if there was an active branch. Your comment is most unhelpful.
+1
Just to add some usage context and workaround on this.
I have a cluster of web servers, and I run goaccess like this on my central syslog-ng machine:
goaccess /var/log/hosts/*/nginx/*.log \
--log-format='%^:%^:%^:%^: %v %h %^[%d:%t %^] "%r" %s %b %L "%R" "%u"' \
--date-format=%d/%b/%Y --time-format=%T --persist --restore \
--db-path /var/goaccess/db -o /var/goaccess/www/index.html \
-o /var/goaccess/www/report.json
This aggregate all requests of the cluster and produce one global report. The current issue would need to be implemented to be able to select which vhost to see in the main report.
As a workaround, I create "per-vhost" logs like this:
VHOSTS="vods.kuon.ch www.kuon.ch"
for f in /var/log/hosts/*/nginx/*.log
do
for vhost in $VHOSTS
do
# Create destination directory
host=$(basename $(dirname $(dirname $f)))
outdir=/var/goaccess/vhosts/$vhost/$host
mkdir -p $outdir
out=$outdir/$(basename $f)
# Filter logs
# NOTE: $vhost will be matched as regex, you may need escaping
rg "^\w+ \d+ \d+:\d+:\d+ \S+ \S+ \S+ access: $vhost " $f > $out
done
done
# Remove empty logfiles
find /var/goaccess/vhosts -size 0 -delete
# Remove empty dirs
find /var/goaccess/vhosts -type d -empty -delete
for vhost in $VHOSTS
do
db=/var/goaccess/db_vhosts/$vhost
mkdir -p $db
out=/var/goaccess/www/$vhost/
mkdir -p $out
goaccess /var/goaccess/vhosts/$vhost/*/*.log \
--log-format='%^:%^:%^:%^: %v %h %^[%d:%t %^] "%r" %s %b %L "%R" "%u"' \
--date-format=%d/%b/%Y --time-format=%T --persist --restore \
--db-path $db -o $out/index.html \
-o $out/report.json
done
It is a bit "quick & dirty" but it works for the time being.
To anyone still following this thread, I found it way easier to just use promtail and grafana, instead of reinventing the wheel with goaccess log parsing, storage, etc.
To anyone still following this thread, I found it way easier to just use promtail and grafana, instead of reinventing the wheel with goaccess log parsing, storage, etc.
I beg to differ. I had a setup with grafana and loki but it was very hard to get some particular insight.
Sure, you can have one very nice panel with the stats, that you can look at, but it doesn't really tell you anything. With goaccess
, when there is an issue I can just grep (rg
) the logs and get the info I want.
Also, I switched to syslog-ng and it is so much better and easier than all new fancy solutions like promtail. Don't get me wrong, I get why all those solutions exists (having to route logs through the internet, better scalability...), but for our use at our size, plain log files are just easier.
I don't think those tools are exclusive. You can use goaccess
to generate a .json
and inject those metrics in prometheus
or other to browse them in grafana.
Finally this kind of setup depends on many things, the number of servers, the criticality of the mission, the size of the team, the skills of the team... I can only advice on trying what fits your situation best.
My use case is described in https://github.com/allinurl/goaccess/issues/2599 (I persist the database on-disk, so there is currently no way to remove a visitor from the Visitor Hostnames and IPs
table, as exclude-ip
will only prevent new visitors from being inserted in the persistent database, but the ones already inserted will be kept).
I understand that this issue is trying to be "generic" (i.e. being able to filter based on any field), and real-time (i.e. ability to set a filter from TUI, command-line, or HTML report) - but I feel this the scope is too wide to actually be actionable/possible to implement (need to write different filter mechanisms for the TUI/CLI/HTML interfaces...)
@allinurl I think it would be good to establish a list of what users actually expect to achieve with this feature.
For me, a simple --exclude-ip-from-report $IP1,$IP2,$IP3,...
command-line flag during one-shot HTML report generation would be sufficient.
Its been a while since I started monitoring this but I think my main requirement was also to exclude certain fixed IPs from my monitoring host from appearing on the list.
@nodiscc, good observation. As I previously explained, it's currently not practical to implement a direct "exclude-ip-from-report" functionality when retrieving data from the persisted store because, at that stage, the data has already been processed. To introduce this feature, we need to restructure how data is stored, making it a bit more complex than a straightforward filtering process. Although there are challenges, progress is being made and will be out sooner than later.
@Hufschmidt, you can achieve exclusion using -e
, for example: -e 127.0.0.1. Are you aiming to exclude in real-time?
Wait for a date filter for a long time
@bear0330 hard at work on this feature! wait won't be in vain ;)
I am also eagerly awaiting this enhancement. I wish i could filter the HTML-Report at least by date (range), to be able to show stats for a specific date (i use persistence and my reports include several days/months).
Add the ability to filter the results within the UI (Terminal & HTML) - e.g. filter by fields such as host, request, etc. then display only data matching that filter criteria, or enter a regex to match in the request and restrict display to only those matching entries.
Ideally this would spin up a new thread so multiple datasets can be analyzed at the same time. Each dataset should live on its own dashboard.