go-graphite / go-carbon

Golang implementation of Graphite/Carbon server with classic architecture: Agent -> Cache -> Persister
MIT License
805 stars 123 forks source link

Carbonlink cache issues #326

Open ajm-dev opened 4 years ago

ajm-dev commented 4 years ago

Hi There

I get the following msg when using graphite-web with go-carbon. It seems like sometimes it does not read the cached metrics in memory correctly but when we refresh the metrics returns.

See graphite-web cache.log below.

CarbonLink sending request for go-carbon.persister.committedPoints to ('127.0.0.1', 'a') 2019-12-02,12:32:51.666 :: Exception getting data from cache ('127.0.0.1', 'a'): [Errno 104] Connection reset by peer 2019-12-02,12:32:51.666 :: CarbonLink cache-query request for go-carbon.persister.committedPoints returned 0 datapoints

Do I need to make any changes on graphite-web? or is this issue related to go-carbon settings.

Below is my go-carbon.conf settings: [common] user = "carbon" graph-prefix = "go-carbon" metric-endpoint = "tcp://127.0.0.1:2003" metric-interval = "0m10s" max-cpu = 12

[whisper] data-dir = "/data01/whisper/" schemas-file = "/etc/go-carbon/storage-schemas.conf" aggregation-file = "/etc/go-carbon/storage-aggregation.conf" workers = 4 max-updates-per-second = 2500 max-creates-per-second = 100 hard-max-creates-per-second = false sparse-create = false flock = true enabled = true hash-filenames = true

[cache] max-size = 900000000 write-strategy = "noop"

[udp] listen = ":2003" enabled = true buffer-size = 0

[tcp] listen = ":2003" enabled = true buffer-size = 0

[pickle] listen = ":2004" max-message-size = 67108864 enabled = true buffer-size = 0

[carbonlink] listen = "127.0.0.1:7002" enabled = true read-timeout = "30s"

[grpc] listen = ":7003" enabled = true

[tags] enabled = false tagdb-url = "http://127.0.0.1:8000" tagdb-chunk-size = 32 tagdb-update-interval = 100 local-dir = "/var/lib/graphite/tagging/" tagdb-timeout = "1s"

[carbonserver] listen = ":8080" enabled = false buckets = 10 metrics-as-counters = false read-timeout = "60s" write-timeout = "60s" query-cache-enabled = true query-cache-size-mb = 0 find-cache-enabled = true trigram-index = true scan-frequency = "5m0s" max-globs = 100 fail-on-max-globs = false graphite-web-10-strict-mode = true internal-stats-dir = "" stats-percentiles = [99, 98, 95, 75, 50]

[dump] enabled = false path = "/var/lib/graphite/dump/" restore-per-second = 0

[pprof] listen = "localhost:7007" enabled = false

[[logging]] logger = "" file = "/var/log/go-carbon/go-carbon.log" level = "error" encoding = "mixed" encoding-time = "iso8601" encoding-duration = "seconds"

Below is grafana graphs of the metrics that is missing and when we refresh it returns.

image

image

ajm-dev commented 4 years ago

Hi Alikhan

Currently we are running Python 2.7.5

Should we try running it on version 3.0+ ?

Regards

On Tue, Apr 21, 2020 at 12:03 AM Alikhan notifications@github.com wrote:

I hope my reply is not too late. Is graphite-web running in python 3.0+? Because I am seeing similar issue with cache not being read using carbonlink.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lomik/go-carbon/issues/326#issuecomment-616835516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN5YTCSWODGP6ZKRPQFKUJTRNTBEJANCNFSM4JTVFGLQ .

-- Andries Malan 0824513047

alikhtag commented 4 years ago

Ah there is an issue with Python 3.0+ Carbonlink (EDIT: PR #340 ) So using Python 2.7 should be fine.

azhiltsov commented 4 years ago

Can you confirm that go-carbon 0fdd9e5 fixed it?

alikhtag commented 4 years ago

Can you confirm that go-carbon 0fdd9e5 fixed it?

The PR #340 should have fixed python 3 issue only, if py2.7 was used I doubt the issue was fixed :/

ajm-dev commented 4 years ago

Hi There

We are using py2.7 like Alik mentioned still waiting for a fix.

Regards

On Sun, Jun 7, 2020 at 5:57 PM Alik notifications@github.com wrote:

Can you confirm that go-carbon 0fdd9e5 https://github.com/lomik/go-carbon/commit/0fdd9e52f94b135e43806e226047e054fca1ebaf fixed it?

The PR should have fixed python 3 issue only, if py2.7 was used I doubt the issue was fixed :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lomik/go-carbon/issues/326#issuecomment-640239877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN5YTCW2HPKQ2WA47KY4K33RVO2GVANCNFSM4JTVFGLQ .

-- Andries Malan 0824513047

deniszh commented 4 years ago

@ajm-dev : I can't reproduce that on python 3 btw (with PR #340). But I didn't use carbonlink massively. You can try to disable carbonlink and enable carbonserver (then use CLUSTER_SERVERS=["127.0.0.1:8080"])

piotr1212 commented 4 years ago

Not sure if this is related at all with the pickle version.

I've run into a similar issues a long time ago which was caused by go-carbon closing inactive connections. Fixed it by enabling tcp_keepalive in the os. Never bothered to create a PR to enable keepalive in go-carbon. This was a long time ago, maybe it is improved by now.