Open fvanderbiest opened 1 year ago
I agree on removing it, but we should consider a replacement solution. Globally, I have the feeling we should ease or at least document the integration of analytics tools, since it is a quite common need
Maybe this could be an interesting workshop during the geocom/following codesprint
Globally, I have the feeling we should ease or at least document the integration of analytics tools, since it is a quite common need
agreed
storing ogc access logs in a postgresql database was not a good idea, since it quickly became "big data"
Concerning this point, what about use : https://github.com/timescale/timescaledb ? It allow to compress and manage data retention if well configured.
From what I could see ES is a sinkhole for resources and not easy to use.
Using timescaledb database with well configured data retention combined with a solution like Grafana and their dashboard can be an alternative to the current analytics tools.
Concerning this point, what about use : https://github.com/timescale/timescaledb ?
Love it ! Thanks for the hint.
Concerning this point, what about use : https://github.com/timescale/timescaledb ?
@jusabatier Can you remind us how you use it / feed the logs into it ?
fwiw i use influxdb for similar needs, but they're on the same level with timescaledb. For log "ingestion" promtail & https://github.com/grafana/loki is used to send metrics to influxdb,but you can also use telegraf or fluentd for the logs.
https://linuxfr.org/news/loki-centralisation-de-logs-a-la-sauce-prometheus
Can you remind us how you use it / feed the logs into it ?
Here is some example config to feed database via log4j2 using JDBC appenders (commented) : https://github.com/georchestra/cadastrapp/blob/master/cadastrapp/src/main/resources/log4j2.properties
It's feed same way as postgresql as it's an extension.
And you can find how to configure retention in the timescaledb docs : https://docs.timescale.com/timescaledb/latest/how-to-guides/data-retention/
Any potential pitfalls and ways to circumvent them ?
There's no plan yet to provide an equivalent feature by <<insert here any fancy tech like ELK or ... >>
Maybe we should ?
as @jusabatier @landryb and @jeanpommier I think ogcstatistics should not be fully removed until some replacement is found. For example, we could remove it from the console, but let the possibility to install ogcstatistic apps when using security proxy.
Another idea, without changing architecture or add more framework and since elasticsearch and kibana are already installed, we could probably test some simple logs insertion via logstash and create a route to get Kibana accessible for admin user. With a specific dashboard.
Why ?
We're preparing the replacement of the older security-proxy by the georchestra-gateway.
This point will probably need more explantation, I know we already spoke about it, but a specific point should be done on this important point when it will occur. Have you an idea when you'd like to replace security-proxy by georchestra-gateway ?
From what I could see ES is a sinkhole for resources
True, but already installed for Geonetwork4 so could be shared
From what I could see ES is a sinkhole for resources
True, but already installed for Geonetwork4 so could be shared
that's more or less discouraged, as kibana is configured for GN indexes only, and there's some hairy url rewriting being done too...
And the default usage made by GN4 (index metadata only) is quite moderate, which allows a "relatively light" ES setup.
Logs are known to become quickly massive data, specially if we want some retention time, which we will need for analytics.
I'd rather have some experiments first with lighter tools like loki. How far did you go with Loki, @landryb ?
How far did you go with Loki, @landryb ?
i have a promtail/loki/grafana dashboard with nginx metrics for mapserver/mapproxy logs. This was a poc done by students in 2021. It's been running in production for 2 years.
i've never got around fully digging more into it to expand it for other needs and fine-tune it more, but the logic is sound. and its lightweight.
$ps aux| egrep '(loki|grafana|promtail)'
loki 91 0.1 0.8 965396 74720 ? Ssl 2022 492:09 /opt/loki/loki-linux-amd64 -config.file /opt/loki/config.yml
promtail 94 0.2 0.3 1508404 27440 ? Ssl 2022 880:21 /opt/promtail/promtail-linux-amd64 -config.file /opt/promtail/config.yml
grafana 385891 0.1 1.0 1937616 88932 ? Ssl Mar23 8:15 /usr/share/grafana/bin/grafana server --config=/etc/grafana/grafana.ini --pidfile=/run/grafana/grafana-server.pid --packaging=deb cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning
the loki datadir takes 14Gb with only those nginx metrics from 2 years.
i've other truenas/proxmox dashboards in grafana but those are not related to logs parsing.
OK, so, to sum up, we have two technical solutions here:
Do we do POCs or is there one emerging from a technical / strategic POV ?
Naturally, I would favor timescaledb since it requires less additional stacks to the already existing components, and offers the potential to rewrite the analytics backend if we want to provide key metrics in the console or anywhere else.
I apologize in advance for inserting noise in this discussion:
I apologize in advance for inserting noise in this discussion:
* any update from the discussions on this topic that occured yesterday during geOcom 2023?
there were some tests during the community sprint with loki and ES as alternatives, @jeanpommier can give more details
* any thought about InfluxDb for this specific use case?
iirc influxdb is more used for metrics coming from telegraf, loki stores his data in his own database
Florent Berault from MEL says that his needs for analytics is more than just OGC WMS/WFS/etc. Ideally the Data API should also be part of it. Since it is "OGC API Features"-based, it makes a lot of sense to me.
Hi @fvanderbiest Could you be more specific about what it would imply ? What would be expected ?
Who ?
Camptocamp, with funding from MEL
Target Module
The
ogc-server-statistics
logger will be removed. Which implies also a removal of the analytics webapp and a rework of the front console application.What ?
As said above, we plan to remove the OGC logging feature from geOrchestra core.
Why ?
We're preparing the replacement of the older security-proxy by the georchestra-gateway.
ogc-server-statistics
How ?
Essentially
git rm -rf analytics
andgit rm -rf ogc-server-statistics
.Any potential pitfalls and ways to circumvent them ?
There's no plan yet to provide an equivalent feature by <<insert here any fancy tech like ELK or ... >> Maybe we should ?
When ?
One should expect geOrchestra 24.0 to be free from ogc-server-statistics, which means funding will be required to get the equivalent feature by then.
State of the vote: