Closed bwplotka closed 10 months ago
Done, also tried to extend the e2e test to validate this. Added https://github.com/GoogleCloudPlatform/prometheus-engine/issues/765 idea too.
Testing using SKIP_GCM=false locally...
I didn't test it yet successfully against GCM, so technically this change is untested. Don't have time to manually test it.
Tagged v0.9.1-rc.3 without this change so far, up to us how we can continue here @pintohutch @TheSpiritXIII
I'm ok merging as well @bwplotka.
Note this should become much easier once #755 is merged and the e2e refactor is readily available. There you can just add a test function or step to the ruler_test.go
file and run GOOGLE_APPLICATION_CREDENTIALS=<path-to-file> TEST_RUN=TestRuleEvaluator make e2e
and it should run in a few minutes against GCM.
Ok, I am going to continue my e2e tests, but starting the tag + Louhi process to not waste time. Thanks for reviews!
... wait, I see a total disaster, blocking this and investigating.
Ok, we would just break & block another release of 0.9 accidently with this PR...
After hardcode debugging (I love kubectl debug now, but it's cryptic):
kubectl debug -it -n gmp-test-testruleevaluation-gzip-20240112-112556 rule-evaluator-6f9c9d85f7-qc9gt --image=busybox:1.28 --target evaluator
:
Yamls are still compressed, because config-reloader does the magic for config files only. Other files are only "watched". We need to add a feature to config-reloader first and make sure we update images once this operator mode is enabled.
Closing for now, let's fix it in 0.10
Ah good catch - E2E tests would've caught that as well
It looks we're past 0.10 as https://github.com/GoogleCloudPlatform/prometheus-engine/releases/tag/v0.11.0-rc.2 is created? I don't know how you're versioning your repo. Is 0.10 skipped? I see bunch of 0.10-rc then jump to 0.11-rc.. Is there any chance this getting prioritized? Our GMP has been broken for months.
Hi @shinebayar-g we're still in the process of validating the 0.10 release. Our release process involves letting the release candidate soak in the GKE Rapid release channel first before stabilization. We're trying to release 0.10 soon and then put 0.11-rc into the GKE Rapid release channel next.
This project has been deprioritized now that Cloud Alerting supports PromQL alerts. We're recommending users move rules into Cloud Alerting where possible. You may still want some application rules and recording rules (which Cloud Alerting does not support) on the rule-evaluator.
I investigated the effort and it is sizable. Our biggest blocker right now is Thanos' reloader does not support decompressing directories. This would involve tracking files being added, renamed and removed. To get this merged upstream, this would also have to support symlinks. Since Thanos and we are open source, we're happy to accept contributions and review code in either repository.
Hope that helps!
Thanks for the reply @TheSpiritXIII
This project has been deprioritized now that Cloud Alerting supports PromQL alerts.
By this project you mean Google Managed Prometheus as a whole? Or specific component of it?
Only the rule-evaluator
component has been deprioritized for the time being. Google Managed Service for Prometheus is very much alive and we're actively working on it!
We're recommending users move rules into Cloud Alerting where possible.
By this are you referring to this? https://cloud.google.com/monitoring/promql/promql-in-alerting & https://cloud.google.com/monitoring/promql/promql-migrate?
We already made a huge migration when we onboarded to GMP from Prometheus Operator. PrometheusRule -> Rules etc..
and it looks like this new one is configurable through Terraform which is completely different deployment model and api than what we currently have.. 😞
Okay, we'll follow https://cloud.google.com/stackdriver/docs/managed-prometheus/rules-unmanaged#eval-rules-unmanaged document to deploy self managed rule-evaluator.
Our biggest blocker right now is Thanos' reloader does not support decompressing directories.
This is assuming rules-generated
config map is already compressed. But this isn't possible today. It looks like this PR already contains the necessary code to compress the generated configmap. If this PR is merged I can work on decompressing code on sidecar container side.
@shinebayar-g whipped up https://github.com/thanos-io/thanos/pull/7199 real quick. Haven't done much testing on it but hypothetically it should work with this change. If we get that change reviewed and merged upstream, we can try and revive this PR!
In the meanwhile you could try and build this PR yourself. Sadly, we can't currently merge this PR right now because it would be broken.
Thanks for your interest in GMP!
Sadly, we can't currently merge this PR right now because it would be broken.
Yeah I agree, thanks for the PR. Hope we'll get it soon! 🚀
@shinebayar-g fyi I've reopened this PR as https://github.com/GoogleCloudPlatform/prometheus-engine/pull/897 -- there's some dependencies we need released first so please track progress there! Thanks!
Thanks for the update @TheSpiritXIII
Fixes https://github.com/GoogleCloudPlatform/prometheus-engine/issues/549