Open sortie opened 6 years ago
The right solution is to gather the performance data, plot that data, and monitor that instead and see if the probability distribution of latency is acceptable.
Is there a ready-made solution we can just plug our data into?
I'm sure there are, though I don't know them that well. Perhaps we can use things like google data studio for this?
This is still a problem (even got worse?)
We need profiling data. Where is all that time spent?
From initial investigations it seems that most of the time is spent in the search service.
Still pending...
We should try profiling the search service - maybe we can find ways to optimize. I think we should be able to handle much higher request rates than we are.
Still an issue
I see a lot of these warnings in the search service logs. These messages obscure other warning level problems. We should either address the performance problem or demote the warning to a note.
However, dumping performance metrics to the logs is not the right solution. The right solution is to gather the performance data, plot that data, and monitor that instead and see if the probability distribution of latency is acceptable.