IzakMarais / reporter

Service that generates a PDF report from a Grafana dashboard
Apache License 2.0
963 stars 309 forks source link

API Request retries for failed loading panels #16

Closed boberty88 closed 6 years ago

boberty88 commented 7 years ago

It would be useful to be able to retry the requests for the panels during the "Downloading image..." part of the application. I have had the problem where some of the panels I'm loading time-out or fail to render properly.

From the logs:

2017/08/07 11:14:32 Downloading image  8 http://example.com/render/dashboard-solo/db/reporter-demo?from=1500894404780&height=490&panelId=8&theme=light&to=1502104004780&width=160
2017/08/07 11:14:37 Downloading image  7 http://example.com/render/dashboard-solo/db/reporter-demo?from=1500894404780&height=490&panelId=7&theme=light&to=1502104004780&width=490
2017/08/07 11:14:37 <html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.10.3</center>
</body>
</html>

These can happen on even simple Text panels if the dashboard contains more than 60+ widgets/panels.

Currently we have implemented a delay in rendering each of the panels in the report.go code. This allows me to go through each of the images sequentially and not overwhelm InfluxDB by hitting it with too many queries at the same time. I'm not concerned with how long it takes as I just want it to email me the PDF report when it's done.

func (rep *Report) renderPNGsParallel(dash grafana.Dashboard) (err error) {
    var wg sync.WaitGroup
    wg.Add(len(dash.Panels))
    delaySecond(30)
    for _, p := range dash.Panels {
        delaySecond(10)
        go func(p grafana.Panel) {
            defer wg.Done()
            err = rep.renderPNG(p)
            if err != nil {
                log.Printf("Error creating image for panel: %v", err)
                return
            }
        }(p)
    }

    wg.Wait()
    return
}

My question is where is the appropriate place to wrap a retry of the Panel API request so that if it fails to load the first time I can try it again (maybe a max of 5 attempts)?

I thought it may be inside the func GetPanelPng in api.go? Or in the func renderPNGsParallel or func renderPNG in report.go?

I'm a complete newbie to Go code so your help is much appreciated.

IzakMarais commented 7 years ago

Yes, I think inside func GetPanelPng would be a good place. The Grafana API should contain the logic for connection handling to the Grafana back-end so this seems like the correct place. Retry 5 times and then fail and return an error.

Instead of adding the delays in the func renderPNGsParallel, you could have a delay between your retry attempts. That way users don't have to wait unnecessarily when all PNGs could be retrieved successfully. When there is a problem with back-end load, the delay between failed attempts should ease the load.

If you've get something working well, I would appreciate a pull request.

clarelewis73 commented 7 years ago

We have this working well, just move the delay after the

go func (p grafana.Panel)

Line, does each API call separately