Closed andreimatei closed 5 years ago
Able to get status.go once after multiple retries. I have shared the log goroutine.log
Thanks, will look. I wanted to ask - does it still timeout after restarting servers?
I wanted to ask - does it still timeout after restarting servers?
I just tried that out. I saw it working once in 5 - 6 attempts. When it worked, i noticed Connections (via Node 1)
was always the case. Not sure that is relevant for the problem here.
Sorry for letting this sit; I've been gone skiing for a few days.
The goroutine dump you've sent doesn't tell us anything beyond the fact that the code trying to generate the report is busy iterating through the ranges (we need to iterate through all the ranges).
Can you please clarify something about the screenshot you've sent: if that's what it looks like when you consider it to be "working", what does it look like when it's not working? Because that screenshot looks like the effects of the timeout to me.
@matbhuvi's cluster has trouble generating that report. We seem to have a 3s timeout for generating that report, which is silly. I'm gonna increase it. Regardless, it's unclear why it times out. I've tried on a cluster we have with some reasonable amount of data, and the report was instantaneous. This is hard to debug... I was hoping I could trace the generation of the report, but I'm having lots of trouble. Opened #34310.
The only way to debug that I can think of is to try to get some goroutine stack dumps at the moment the report generation is running. @matbhuvi, would you mind trying to generate that report a few times, and immediately after you refresh the report page, switch to the debug page of a couple of other nodes and click on the
All Goroutines
link. If the results contain anything fromstatus.go
, they might be useful for us. If not, I'm not sure how to debug further, although I'll work on generally improving that code and its debuggability for the future. Thanks!