go-graphite / carbonapi

Implementation of graphite API (graphite-web) in golang
Other
308 stars 140 forks source link

how to design for high availability? #250

Open blysik opened 6 years ago

blysik commented 6 years ago

I have two go-carbon clusters that carbon-c-relay sends data to, which are mirrors.

I have 4 carbonapi hosts in a VIP, which have the two clusters configured.

I had 1 of the go-carbon hosts go down, and that seemed to impact overall reading.

I would have thought with this setup, I would still be able to read data points from the other cluster.

Is there something I'm doing wrong?

azhiltsov commented 6 years ago

We are still using carbonzipper in between carbonapi and go-carbon in our setup, as far as I know master is still 'work in progress', right @Civil ? anything pre- carbonapi 3596e9647611e1f833a911d663747271623ec003 should work fine with separate zipper instances.

blysik commented 6 years ago

Sorry, yes. I'm doing carbonapi -> carbonzipper -> go-carbon.

It just seemed like a go-carbon instance going down caused way more noise than it should have.

deniszh commented 6 years ago

Hello @blysik, Could you please elaborate? What you mean by "noise" and how dead zipper backend affects reads from carbonapi in your case?

Civil commented 6 years ago

@blysik can you please provide more details on the issue?

As about box that affects rendering - if carbonzipper/carbonapi in recent versions you can tune connection timeouts, that most probably will help in your case.

blysik commented 6 years ago

Sorry for the long delay. So I have two go-carbon backend clusters. When I a single go-carbon server goes down, grafana -> carbonapi -> carbonzipper -> [clusters] suddenly starts timing out.

I would expect carbonzipper to just get the data points from the go-carbon instance that is up, and ignore the downed one. Essentially carbonzipper hangs trying to query the downed backend, rather than just switching over to one that's available.

I'm guessing I just need to tune connection timeouts. Are connection timeout settings available in both carbonapi and carbonzipper, in the 0.9.0 and 0.73.2 releases? (I'm assuming that's what I should be running in production.)

Thanks.

Civil commented 6 years ago

Latest 0.9 release have carbonzipper functionality in it. https://github.com/go-graphite/carbonapi/blob/master/carbonapi.example.yaml#L61 you can look here for examples (please note that 'zipper' option have priority over backends).

But yes, if your host is completely down, tuning connect timeouts will help. If it'll become too slow - well, you can finetune timeouts, but it will still affect overall query times a lot at this moment. I'll think about adding some sort of statuses to go-carbon in future.

Civil commented 5 years ago

There is also some examples in current documentation: https://github.com/go-graphite/carbonapi/blob/master/doc/configuration.md

However documentation is about current release (0.12.0) and some features might be not available in other versions.