Closed amckinley closed 4 years ago
For whoever looks at this issue, here is the same configuration but with log.level: 'debug'
https://pastebin.com/NukE3AaQ
Update on this: we also tried this out with s3
as a backing store instead of local
disk, we got the same error.
specifically replacing
-ruler.storage.local.directory=/etc/cortex/ruler
-ruler.storage.type=local
with
-ruler.storage.s3.url=<s3_url>
-ruler.storage.type=s3
Thank you for your reports. You're right that these logs don't contain the error from "blocks storage queryable" subservice, so it's difficult to diagnose what's wrong. I've extended error reporting on this code path in #3125, if you have a chance to give it a try, that would be helpful.
That's great! Any chance there's going to be a v1.4 release candidate cut with #3125 any time soon? If we need to do a custom build of cortex to test this, we can, but right now we're running the official Dockerhub v1.3.0 release, so we'd have to build our own container.
Next release cycle will start in two weeks (week starting 14th of September), with final release likely the week after.
Yep, this new logging is very helpful. This is obviously our problem to solve, but here's the new error we're seeing:
level=info ts=2020-09-03T20:20:07.547735931Z caller=module_service.go:58 msg=initialising module=ring
level=info ts=2020-09-03T20:20:07.547968583Z caller=module_service.go:58 msg=initialising module=distributor-service
level=debug ts=2020-09-03T20:20:07.548262851Z caller=module_service.go:48 msg="module waiting for initialization" module=ruler waiting_for=store-queryable
level=info ts=2020-09-03T20:20:07.647066619Z caller=fetcher.go:453 org_id=fake component=block.BaseFetcher msg="successfully synchronized block metadata" duration=98.064503ms cached=163 returned=81 partial=0
level=info ts=2020-09-03T20:20:08.661072094Z caller=module_service.go:58 msg=initialising module=ruler
level=info ts=2020-09-03T20:20:08.666964583Z caller=basic_lifecycler.go:242 msg="instance not found in the ring" instance=ruler-6567dd65f7-j8zsh ring=ruler
level=info ts=2020-09-03T20:20:08.668821489Z caller=ruler.go:345 msg="ruler up and running"
level=info ts=2020-09-03T20:20:08.668890952Z caller=cortex.go:315 msg="Cortex started"
level=debug ts=2020-09-03T20:20:08.669503307Z caller=ruler.go:316 msg="rule group not owned, address does not match" owner_addr=10.85.127.207:9095 addr=10.85.107.91:9095
level=debug ts=2020-09-03T20:20:08.669529559Z caller=ruler.go:316 msg="rule group not owned, address does not match" owner_addr=10.85.39.130:9095 addr=10.85.107.91:9095
level=error ts=2020-09-03T20:20:08.669560437Z caller=manager.go:126 msg="unable to map rule files" user=..2020_09_03_20_20_02.632031398 err="mkdir /etc/cortex/ruler/rules: read-only file system"
level=debug ts=2020-09-03T20:20:18.284269479Z caller=logging.go:66 traceID=68de0a0664a661cb msg="GET /ready (200) 61.21µs"
level=debug ts=2020-09-03T20:20:28.284203914Z caller=logging.go:66 traceID=4a584fa842a1e1c0 msg="GET /ready (200) 45.996µs"
level=debug ts=2020-09-03T20:20:38.284184123Z caller=logging.go:66 traceID=536be7035ca8de51 msg="GET /ready (200) 36.714µs"
level=debug ts=2020-09-03T20:20:48.284152674Z caller=logging.go:66 traceID=70740d96899a43c1 msg="GET /ready (200) 38.596µs"
level=debug ts=2020-09-03T20:20:58.284263333Z caller=logging.go:66 traceID=7963f60e2a1a2619 msg="GET /ready (200) 38.025µs"
level=debug ts=2020-09-03T20:21:08.284177299Z caller=logging.go:66 traceID=574cf0c624415978 msg="GET /ready (200) 43.811µs"
level=debug ts=2020-09-03T20:21:08.669663882Z caller=ruler.go:316 msg="rule group not owned, address does not match" owner_addr=10.85.60.132:9095 addr=10.85.107.91:9095
level=debug ts=2020-09-03T20:21:08.669698516Z caller=ruler.go:316 msg="rule group not owned, address does not match" owner_addr=10.85.60.132:9095 addr=10.85.107.91:9095
level=error ts=2020-09-03T20:21:08.669724514Z caller=manager.go:126 msg="unable to map rule files" user=..2020_09_03_20_20_02.632031398 err="mkdir /etc/cortex/ruler/rules: read-only file system"
This should be safe to close -- we fixed the error above by removing the -ruler.rule-path=/etc/cortex/ruler/rules
argument.
For posterity, the unable to start blocks storage queryable subservices: not healthy
error turned out to be a permissions one for us, in which we didn't provide the correct IAM role with proper s3 read/write permissions. Though, this kind of begs the question of why we need s3 access in the first place if we're using local
as storage.
For posterity, the
unable to start blocks storage queryable subservices: not healthy
error turned out to be a permissions one for us, in which we didn't provide the correct IAM role with proper s3 read/write permissions.
Thanks for update!
Though, this kind of begs the question of why we need s3 access in the first place if we're using
local
as storage.
“blocks storage queryable” is a component responsible for querying data from long-term storage. Ruler uses it (because Ruler executes queries locally, not via querier), and this component needs to be able to read blocks from long-term storage.
This is very similar to https://github.com/cortexproject/cortex/issues/2991. In this case, it's the ruler that's failing to start:
We're running cortex v.1.3.0. Here's the ruler's config:
One of the diagnoses from the previous issue was an unhealthy consul cluster, which is not a problem we appear to be having since all the other components are healthy.