fluxcd / flux2-monitoring-example

Prometheus monitoring for the Flux control plane
https://fluxcd.io/flux/monitoring/
Apache License 2.0
52 stars 145 forks source link

Replace loki-stack helm chart with loki + promtail #23

Open mjtrangoni opened 9 months ago

mjtrangoni commented 9 months ago

Hi @darkowlzz,

related to #19, this replaces the deprecated loki-stack helm chart with the more powerful loki + promtail helm charts.

By this change Loki will be upgraded from v2.6.1 to v2.9.3, and to get it running in my test environment I had to do some changes to the the loki dashboard. Let me know if you agree with those changes.

Also found out that the prometheus-mixin projects use / as separator in the dashboards names, so I changed the titles to make it more uniform as well.

This adds Dashboards Links as Dropdown menu via the tag flux, similar to what the Loki dashboards have.

Thank you!

kingdonb commented 8 months ago

@mjtrangoni It looks like there are some extra unnecessary changes in this PR that can be removed, so it only contains the changes that you intended - are you still there?

If we don't hear back from you soon, we will probably try to rebase the extra changes out

mjtrangoni commented 8 months ago

@kingdonb sorry, you are right. I will remove all my changes around the dashboard, and let this be merged. BTW, I found that the table_manager removal was not working and switched to the compactor, which in my case is working well

kingdonb commented 6 months ago

I don't think we should configure storage, if the user attempts to install this example on a kind cluster then it will break because no persistent volume claims are allowed. I just tested it and on my cluster I had this issue, because there was no storage driver for CSI configured:

  Warning  FailedScheduling  3m14s  default-scheduler  0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

Other than that issue I think this is a great config update thank you for contributing it @mjtrangoni

I'm just going by precedent that the old version with loki-stack did not configure the filesystem storage, I think that means it's using impermanent / non-persistent or ephemeral (tmp) storage, and that means that data is lost when loki restarts. I think that is OK. This is just an example config, users are not meant to literally take this repo and implement it without any changes. But maybe some docs about how to configure the storage sensibly would be a helpful addition?

I just don't want the guide to break for kind users, so I guess we shouldn't be configuring storage as that will break it.

stefanprodan commented 6 months ago

Both Prometheus and Grafana use an emptyDir for storage, we should do the same for Loki as it was before.

kingdonb commented 6 months ago

https://github.com/grafana/loki/issues/10621

IDK if the new Loki chart actually supports emptyDir, from the open issue on their git repository I suspect it doesn't.

Maybe there is some way that we can hack it, but I tried messing with the config for half an hour and wound up at this issue where it seems like this wouldn't be supported as no production environment is complete without storage.

I'm not sure what to do, it really sucks that loki doesn't seemingly have a way to install on kind clusters where there is no persistent volume (or maybe the local-path-storage is included now, I don't actually use kind myself, this is a talos cluster)

stefanprodan commented 6 months ago

The local-path-storage provider is included in Kind.

mjtrangoni commented 5 months ago

I will give it a try with the new loki v3.0 helm chart next week, and also rebase against the upstream changes.

Reintjuu commented 4 months ago

@mjtrangoni, did you get the chance to update it to v3.0?