Closed jakubgs closed 3 years ago
And I'm pretty sure it's not lack of resources because the hosts are under-utilized if we look at CPU/RAM for Cortex nodes:
Under 15% CPU utilization and ~11GB memory free most of the time. So I'm pretty sure it's not the resources. Same can be said for Cassandra, which is even lower:
So it's clearly something about my configuration that is under-utilizing the hardware available and causing these longer queries to fail.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
Has anyone solved this problem?
Yes. I solved it by taking our Apache Cassadra cluster behind the shed and putting it out of its myssery.
The S3 backend works much better.
Issue
When I make a query for 21 days for a metric I get back a result without issues:
But when I increase the query timerange to 22 days it fails horribly with a
500
error:Logs
The
query-frontend
shows this in logs:Debug log level doesn't show anything more than that.
The cortex instances running with
all
do not print any errors or warnings at this time.Setup
Cortex:
1.5.0
, binary Cassandra:3.11.9
, binary Storage: ChunksConfiguration
Here are example configurations of my nodes:
cortex
runningall
- configcortex
runningquery-frontend
- configQuestions
expanding series: not found
mean?