bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 6 forks source link

Add the `prometheus-longterm-metrics` and `thanos` optional components #461

Open mishaschwartz opened 5 months ago

mishaschwartz commented 5 months ago

Overview

The prometheus-longterm-metrics component collects longterm monitoring metrics from the original prometheus instance (the one created by the components/monitoring component).

Longterm metrics are any prometheus rule that have the label group: longterm-metrics or in other words are selectable using prometheus's '{group="longterm-metrics"}' query filter. To see which longterm metric rules are added by default see the optional-components/prometheus-longterm-metrics/config/monitoring/prometheus.rules.template file.

To configure this component:

Enabling the prometheus-longterm-metrics component creates the additional endpoint /prometheus-longterm-metrics.

The thanos component enables better storage of longterm metrics collected by the optional-components/prometheus-longterm-metrics component. Data will be collected from the prometheus-longterm-metrics and stored in an S3 object store indefinitely.

When enabling this component, please change the default values for the MINIO_ROOT_USER and MINIO_ROOT_PASSWORD by updating the env.local file. These set the login credentials for the root user that runs the minio object store.

Enabling the thanos component creates the additional endpoints:

This also includes an update to the prometheus version from v2.19.0 to the current latest v2.52.0. This is to required to support the interaction between prometheus and thanos.

Changes

Non-breaking changes

Related Issue / Discussion

Additional Information

CI Operations

birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false

mishaschwartz commented 5 months ago

this assumes a MinIO server exists.

If you enable the thanos optional component, the a minio server will be created in a docker container. It is not dependent on an externally deployed/created minio server. But if you have one already you could customize this to use that one instead.

I'm wondering if a MinIO config already officially exists on birdhouse-deploy?

Not at the moment, you likely have a custom component that sets one up for PAVICS. Do you know if this is true @tlvu?

huard commented 5 months ago

Huh, that's convenient ! Ok thanks !

tlvu commented 5 months ago

I'm wondering if a MinIO config already officially exists on birdhouse-deploy?

Not at the moment, you likely have a custom component that sets one up for PAVICS. Do you know if this is true @tlvu?

@mishaschwartz Yes I deployed one, just for prototyping so did not hook it up "officially" to the PAVICS stack because it's not behind the proxy and Magpie.

@huard like Misha, I'd rather have the MinIO for the monitoring and MinIO for other usage be separate. It much simplifies the management (custom config, version requirements, upgrade, ...). I think Weaver or Magpie was hooking into the default Posgres at some point, then it also rolls it's own Postgres container because it requires a different version of Postgres.

@mishaschwartz Can this new component be enabled standalone? Meaning on another physical host with only itself and the proxy, as describe in this comment https://github.com/bird-house/birdhouse-deploy/issues/277#issuecomment-2043273300?

If not yet standalone, it's okay we can do it in another PR.

huard commented 5 months ago

Good point, makes sense.

mishaschwartz commented 5 months ago

@tlvu

Can this new component be enabled standalone? Meaning on another physical host with only itself and the proxy, as describe in this comment https://github.com/bird-house/birdhouse-deploy/issues/277#issuecomment-2043273300?

Not with the current way that we deploy without some serious additional coordination to support the networking and permissions. If we used something like kubernetes or docker swarm to deploy this would be easier to deploy on multiple hosts but we're not there yet.

tlvu commented 5 months ago

additional coordination to support the networking and permissions.

Right. Let's say we expose the 1st Prometheus port to the PAVICS docker host and configure the firewall on the PAVICS docker host to allow connection from the 2nd Prometheus long term only, would that would that work?

mishaschwartz commented 5 months ago

Right. Let's say we expose the 1st Prometheus port to the PAVICS docker host and configure the firewall on the PAVICS docker host to allow connection from the 2nd Prometheus long term only, would that would that work?

Yup, that would work. I think that code can go in a different repo though.

tlvu commented 5 months ago

Right. Let's say we expose the 1st Prometheus port to the PAVICS docker host and configure the firewall on the PAVICS docker host to allow connection from the 2nd Prometheus long term only, would that would that work?

Yup, that would work. I think that code can go in a different repo though.

Yup, I never see this code in this repo. Ouranos have our own override repo so we can do this in our own override. Because only our repo will actually know what are all the "other" hostnames !

The only change needed for this repo so to turn the targets section in birdhouse/optional-components/prometheus-longterm-metrics/prometheus.yml.template to a template variable so our override repo can override with a real list of "other" hosts.

mishaschwartz commented 4 months ago

@tlvu I've made a few small changes to make it easier to deploy this on a separate server. Check out https://github.com/bird-house/birdhouse-deploy/pull/461/commits/2eab8b7aa95e94e631813b2e85d419c91a075613 for the relevant changes.

mishaschwartz commented 1 week ago

@fmigneault please check out the changes since your last review when you have a minute (https://github.com/bird-house/birdhouse-deploy/pull/461/files/3f75d496b932ab4d90857206003d0225cd20c435..59f6c6819ff6529b5ce2c43b8b04a723d5bfd437)