go-graphite / carbonapi

Implementation of graphite API (graphite-web) in golang
Other
309 stars 140 forks source link

carbonapi {"error":"unexpected EOF"} #455

Closed cjagus closed 4 years ago

cjagus commented 4 years ago

We started using 0.12.6 from .11 and noticing below erros on logs.

"type":"fetch","request":"&MultiFetchRequest{Metrics:[{host.snowflake-web.*.timers.snowflake.events.XXXX.load_lag.upper 1585244646 1585245546 false host.snowflake-web.*.timers.snowflake.events.XXXX.load_lag.upper []}],}","errors":[{"error":"unexpected EOF"}]}

We are using basic upstreams > backends: > go-carbon, in previous version we used to saw 404 errors in logs also noticed sendGlobsAsIs and alwaysSendGlobsAsIs is removed from the example configs

cc @Civil

Civil commented 4 years ago

Since 0.11 there were a lot of changes in a way how carbonapi config is structured and in a way how to configure it. I would expect that the issue is in some configuration parameters that I by mistake not properly map from 0.11 to 0.12 format (carbonapi 0.12 internally convert old style config to a new style one, or at least tries to)

In 0.12, sendGlobsAsIs was mostly replaced with https://github.com/go-graphite/carbonapi/blob/master/doc/configuration.md#maxbatchsize and later with per-backend variant of the same option. For that option, 0 means unlimited and it should behave the same way as alwaysSendGlobsAsIs: true for 0.11, value of 1 should behave the same as sendGlobsAsIs: false and other value should behave as sendGlobsAsIs: true + alwaysSendGlobsAsIs: false.

Anyway, I have some follow up questions:

  1. Could you please share the full config or at least specify what protocol you are using?
  2. What previous version you've used?
  3. In this case, do those metrics exist on the server?
  4. Do you have relevant response from the go-carbon side? Potentially you can have a look at the result by using tcpdump.
  5. What version of go-carbon are you using? With 0.12 I would recommend to use latest stable at least, as it could be that I've missed some corner case with older versions of go-carbon, however if I'll be able to replicate your issue I'll fix that as 0.12 should be able to work correctly with any version of go-carbon.
cjagus commented 4 years ago

Thanks for the response.

Could you please share the full config or at least specify what protocol you are using?


listen: "10.0.0.25:80"
concurency: 1000
cache:
type: "mem"
size_mb: 4096
defaultTimeoutSec: 60

cpus: 0 tz: "" sendGlobsAsIs: true alwaysSendGlobsAsIs: false

functionsConfig: graphiteWeb: /etc/carbonapi/graphiteWeb.yaml maxBatchSize: 1000

graphite: host: "10.0.0.25:2003" interval: "60s" prefix: "carbon.api" pattern: "{prefix}.{fqdn}" idleConnections: 20 pidFile: ""

upstreams: buckets: 10 timeouts: global: "60s" afterStarted: "60s" connect: "200ms" concurrencyLimit: 0 keepAliveInterval: "30s" maxIdleConnsPerHost: 100 backends:

0.11.0

In this case, do those metrics exist on the server?

No metrics doesn't exist on the server

Do you have relevant response from the go-carbon side? Potentially you can have a look at the result by using tcpdump.

I see 404 "no metrics found","http_code":404}

What version of go-carbon are you using?

0.14.0

Civil commented 4 years ago

Ok, I think I have an idea what's going on and likely it's the same issue as described here with graphite-clickhouse: https://github.com/go-graphite/carbonapi/issues/454

But I'll try to reproduce it later today.

I think the proper way to fix that would be to check status code first and in case of 404 I shouldn't try to parse the response.

cjagus commented 4 years ago

@Civil Also noticed carbonapi[0.12.6 ] using high Memory [trying to use all the available memory in the instance] when trying to fetch metrics with lots of globs.

Civil commented 4 years ago

Can you share the heap profiler output?

To do that you need to enable pprof in the config (by default it's disabled)

# Specify if metrics are exported over HTTP and if they are available on the same address or not
# pprofEnabled controls if extra HTTP Handlers to profile and debug application will be available
expvar:
  enabled: true
  pprofEnabled: false
  listen: ""

listen directive there allows you to bind only on localhost for example and expose it on a different port. Currently all expvar stuff (including metrics) are controled by that listen directive, so if you change that - it will also change listener for the metircs and for pprof.

And after some of the heavy queries, please execute something like (assuming it's exposed on port 8081):

go tool pprof http://localhost:8081/debug/pprof/heap

And provide output of:

top10

top10 -cum

This one should produce svg image but it requires graphviz to be installed

web



As well as the logs of the carbonapi (I'm interested in what's in the logs) and if you can, please also share the current counters at carbonapi's url in `/debug/vars` (feel free to remove anything you don't want to share)
cjagus commented 4 years ago

Pasting output here

top10

Fetching profile over HTTP from http://localhost:8081/debug/pprof/heap
Saved profile in /root/pprof/pprof.carbonapi.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz
File: carbonapi
Build ID: 0204d7ff7e856c78432128c1bca82042adc36610
Type: inuse_space
Time: Mar 29, 2020 at 9:08am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 7237.02MB, 99.94% of 7241.23MB total
Dropped 19 nodes (cum <= 36.21MB)
Showing top 10 nodes out of 26
      flat  flat%   sum%        cum   cum%
 6630.57MB 91.57% 91.57%  6630.57MB 91.57%  github.com/go-graphite/protocol/carbonapi_v2_pb.(*FetchResponse).Unmarshal
     384MB  5.30% 96.87%      384MB  5.30%  bytes.makeSlice
   82.52MB  1.14% 98.01%    82.52MB  1.14%  github.com/go-graphite/carbonapi/expr/tags.ExtractTags
   77.52MB  1.07% 99.08%   160.04MB  2.21%  main.zipper.Render
   43.22MB   0.6% 99.68%    43.22MB   0.6%  github.com/go-graphite/carbonapi/zipper/types.(*ServerFetchResponse).Merge
   13.15MB  0.18% 99.86%  7032.60MB 97.12%  github.com/go-graphite/carbonapi/zipper/protocols/v2.(*ClientProtoV2Group).Fetch
    4.39MB 0.061% 99.92%  6634.96MB 91.63%  github.com/go-graphite/protocol/carbonapi_v2_pb.(*MultiFetchResponse).Unmarshal
    1.65MB 0.023% 99.94%   164.40MB  2.27%  github.com/go-graphite/carbonapi/cmd/carbonapi/http.renderHandler
         0     0% 99.94%      384MB  5.30%  bytes.(*Buffer).ReadFrom
         0     0% 99.94%      384MB  5.30%  bytes.(*Buffer).grow

top10 -cum

(pprof) top10 -cum
Showing nodes accounting for 7032.10MB, 97.11% of 7241.23MB total
Dropped 19 nodes (cum <= 36.21MB)
Showing top 10 nodes out of 26
      flat  flat%   sum%        cum   cum%
         0     0%     0%  7075.83MB 97.72%  github.com/go-graphite/carbonapi/zipper/broadcast.(*BroadcastGroup).doSingleFetch
   13.15MB  0.18%  0.18%  7032.60MB 97.12%  github.com/go-graphite/carbonapi/zipper/protocols/v2.(*ClientProtoV2Group).Fetch
    4.39MB 0.061%  0.24%  6634.96MB 91.63%  github.com/go-graphite/protocol/carbonapi_v2_pb.(*MultiFetchResponse).Unmarshal
 6630.57MB 91.57% 91.81%  6630.57MB 91.57%  github.com/go-graphite/protocol/carbonapi_v2_pb.(*FetchResponse).Unmarshal
         0     0% 91.81%   384.50MB  5.31%  github.com/go-graphite/carbonapi/zipper/helper.(*HttpQuery).DoQuery
         0     0% 91.81%   384.50MB  5.31%  github.com/go-graphite/carbonapi/zipper/helper.(*HttpQuery).doRequest
         0     0% 91.81%      384MB  5.30%  bytes.(*Buffer).ReadFrom
         0     0% 91.81%      384MB  5.30%  bytes.(*Buffer).grow
     384MB  5.30% 97.11%      384MB  5.30%  bytes.makeSlice
         0     0% 97.11%      384MB  5.30%  io/ioutil.ReadAll
(pprof) exit

/debug/vars

{
"BuildVersion": "0.12.6",
"GoVersion": "go1.12.6",
"cache_items": 57,
"cache_size": 174998,
"cmdline": ["/usr/bin/carbonapi","-config","/etc/carbonapi/carbonapi.yaml"],
"config": {"ExtrapolateExperiment":false,"Logger":[{"logger":"","file":"/var/log/carbonapi/carbonapi.log","level":"error","encoding":"json","encoding-time":"iso8601","encoding-duration":"seconds","sample-tick":"","sample-initial":0,"sample-thereafter":0}],"Listen":"10.0.0.8:80","Buckets":10,"Concurency":1000,"Cache":{"Type":"mem","Size":4096,"MemcachedServers":["127.0.0.1:1234","127.0.0.2:1235"],"DefaultTimeoutSec":60},"Cpus":0,"TimezoneString":"","UnicodeRangeTables":null,"Graphite":{"Pattern":"{prefix}.{fqdn}","Host":"10.0.0.8:2003","Interval":60000000000,"Prefix":"carbon.api"},"IdleConnections":20,"PidFile":"","SendGlobsAsIs":true,"AlwaysSendGlobsAsIs":false,"MaxBatchSize":1000,"Zipper":"","Upstreams":{"ConcurrencyLimitPerServer":0,"MaxIdleConnsPerHost":100,"Backends":["http://graphite-1:8080","http://graphite-2:8080","http://graphite-3:8080","http://graphite-4:8080","http://graphite-5:8080","http://graphite-6:8080","http://graphite-7:8080","http://graphite-8:8080","http://graphite-9:8080","http://graphite-10:8080"],"BackendsV2":{"Backends":null,"MaxIdleConnsPerHost":0,"ConcurrencyLimitPerServer":0,"Timeouts":{"Find":0,"Render":0,"Connect":0},"KeepAliveInterval":0,"MaxTries":0,"MaxBatchSize":0},"MaxBatchSize":0,"MaxTries":0,"CarbonSearch":{"Backend":"","Prefix":"virt.v1.*"},"CarbonSearchV2":{"Backends":null,"MaxIdleConnsPerHost":0,"ConcurrencyLimitPerServer":0,"Timeouts":{"Find":0,"Render":0,"Connect":0},"KeepAliveInterval":0,"MaxTries":0,"MaxBatchSize":0,"Prefix":""},"ExpireDelaySec":0,"InternalRoutingCache":600000000000,"Timeouts":{"Find":2000000000,"Render":10000000000000,"Connect":200000000},"KeepAliveInterval":30000000000},"ExpireDelaySec":10,"GraphiteWeb09Compatibility":false,"IgnoreClientTimeout":false,"DefaultColors":null,"GraphTemplates":"/etc/carbonapi/graphTemplates.yaml","FunctionsConfigs":{"graphiteweb":"/etc/carbonapi/graphiteWeb.yaml"},"HeadersToPass":null,"HeadersToLog":null,"Define":null,"Prefix":"","Expvar":{"Listen":"localhost:8081","Enabled":true,"PProfEnabled":true}},
"find_cache_hits": 0,
"find_cache_misses": 0,
"find_cache_overhead_ns": 0,
"find_requests": 0,
"memstats": {"Alloc":2372664080,"TotalAlloc":211056428776,"Sys":21529567320,"Lookups":0,"Mallocs":73962190,"Frees":73765180,"HeapAlloc":2372664080,"HeapSys":20735295488,"HeapIdle":18346196992,"HeapInuse":2389098496,"HeapReleased":0,"HeapObjects":197010,"StackInuse":1343488,"StackSys":1343488,"MSpanInuse":23957280,"MSpanSys":77398016,"MCacheInuse":6944,"MCacheSys":16384,"BuckHashSys":1616429,"GCSys":699103232,"OtherSys":14794283,"NextGC":4745273856,"LastGC":1585473191468786725,"PauseTotalNs":3924597881,"PauseNs":[49097,11949,12093,8008,11864,11004,7058,12928,85440,87458,93684,67585,66674,8456,31112,37035,11341,31292,10180,13517,8328,12947,13595,13678,20363,54739,14736,1278608,41227,3793113,21209,7594,57495,25409,69691,88195,6106,32125,19096,23668,14618,11230,10526,8476,28356,10543,42359,12724,11795,92195,92204,9287,8490,8842,138568,18967,7971,11492,14230,33805,85754,21939,15476,12852,12265,12275,176282,11942,8863728,16703,24305,11830705,12904,6489088,30167,63043523,365261116,360245447,244177991,1044757619,1058092201,93922,3881747,256477455,4074873,150930,77565,37470,1911264,10827766,39883,3035316,56241,169822,9615042,37079,3619145,133182,61720,10613285,52558,8265965,11124650,56355,79931,25778,74213,68040,82919,734103,2966591,1346329,146324990,65029,54482,25209,9539,6956,120416,11090,32839,10447,9366,16987,74620,11770,35144,10346,12882,13376,7597,37412,12181,11063,18362,34475,7401,6501,8196,11450,7539,28118,44523,9304,9641,12513,11261,12895,7848,12672,13715,9806,7363,19747,20984,11360,8836,16430,14387,8226,36338,42634,9320,13016,11222,96251,30652,627619,11256,36799,6461,12152,40983,13945,36974,24780,15100,16028,13265,1804056,14210,46740,7479,12836,13575,11947,29523,13774,10874,11726,14256,9169,10382,11518,8426,44949,7576,14049,11499,25859,41553,7756,11105,20069,23282,12362,46109,42858,8355,9032,43546,32252,17979,5852,76560,61321,8345,33203,13615,28558,70686,11226,9305,22300,25847,92602,9406,24294,7674,9984,11023,6660,6737,13824,8340,9635,90844,13405,86610,60925,6572,36635,7123,13258,7930,68853,52815,203058,67808,12729,39605,16755,52055,9404,26422,11408],"PauseEnd":[1585472525703117320,1585472525713174228,1585472525716173478,1585472525721173727,1585472525725736843,1585472525727915329,1585472525732711256,1585472526606179371,1585472526608513176,1585472526610506879,1585472526612778813,1585472526614855527,1585472526616943016,1585472526618823258,1585472526621003089,1585472526626954996,1585472528607952096,1585472530606190669,1585472530838527206,1585472530920117974,1585472530953945483,1585472531017436389,1585472590606400734,1585472592837986914,1585472642610107841,1585472642634534878,1585472643605057794,1585472643615480946,1585472643620785488,1585472643626823096,1585472643629291859,1585472643631743173,1585472643646641691,1585472643653052434,1585472644605299954,1585472644657580858,1585472644662108240,1585472644666538262,1585472644670304553,1585472644673362103,1585472680607601668,1585472680685064155,1585472681592007212,1585472681599574772,1585472681628065236,1585472681634776865,1585472687607859838,1585472701626645867,1585472701630335943,1585472702606906793,1585472702615812237,1585472702622312218,1585472702624381523,1585472702629081091,1585472703607334505,1585472703668960669,1585472703673293617,1585472703682694752,1585472703690699563,1585472703708458263,1585472703714511572,1585472704947483165,1585472705613424338,1585472705624157374,1585472705649008540,1585472705682395005,1585472705740567119,1585472705798862412,1585472705963587038,1585472706033805150,1585472706156504610,1585472706235564569,1585472706499206002,1585472706656334874,1585472706995711217,1585472707399922772,1585472708332978916,1585472709278288848,1585472710764726159,1585472713994828337,1585472718121314657,1585472720559909791,1585472729320827951,1585472731877874452,1585472734228961022,1585472740044401019,1585472741737019009,1585472745050226962,1585472747952896777,1585472749085031930,1585472753174347588,1585472756511141877,1585472757997457947,1585472762317308083,1585472765689286918,1585472768281680347,1585472771697487071,1585472776238103779,1585472779976374785,1585472782569105817,1585472786067321884,1585472789575581980,1585472794966231326,1585472799353461475,1585472802949246326,1585472807677291181,1585472811712290481,1585472817148240341,1585472822035607752,1585472829986302914,1585472950105228591,1585473070824431081,1585473191468786725,1585472364881271445,1585472364889017801,1585472366624486002,1585472366628780725,1585472366641941069,1585472369610362565,1585472369613582325,1585472370607299915,1585472372610161870,1585472392633450831,1585472398666126478,1585472398739530769,1585472398940268070,1585472401625325500,1585472402605243153,1585472402630248688,1585472402637640727,1585472402641618832,1585472403609128291,1585472404605005105,1585472405619994400,1585472405705650869,1585472405708268411,1585472405715578935,1585472405718884282,1585472405720560194,1585472405751873365,1585472405758662585,1585472406610660359,1585472406616285838,1585472408608447672,1585472409608104058,1585472411616973697,1585472411830124622,1585472411919675517,1585472411948076517,1585472412008431618,1585472412745322548,1585472412759069305,1585472412761392626,1585472412766449492,1585472412768675688,1585472412779032463,1585472412804400652,1585472412841080226,1585472412843757272,1585472412853233501,1585472415613946124,1585472416646691627,1585472418607050167,1585472420608471791,1585472420611979473,1585472421626295263,1585472421636309571,1585472427608671808,1585472427638817579,1585472427642011918,1585472427643784642,1585472428621834791,1585472430605448914,1585472432610527944,1585472436616433475,1585472436620597766,1585472436625397010,1585472440629819024,1585472440641152846,1585472440686392664,1585472440724453770,1585472440728424430,1585472440731236778,1585472442612941730,1585472442654912456,1585472445611021086,1585472448605144134,1585472450610190431,1585472454629412228,1585472454642314288,1585472454671938253,1585472454676075910,1585472460605081059,1585472461614250259,1585472461617339495,1585472463608950956,1585472463613434407,1585472464610130643,1585472465604720478,1585472466605282661,1585472466610500174,1585472466613440730,1585472469605088198,1585472470604752469,1585472471616732251,1585472472606233451,1585472472611913251,1585472472622492853,1585472472627293308,1585472472633388199,1585472472651738596,1585472472671361382,1585472472691209148,1585472472698328569,1585472472700770129,1585472472708959096,1585472472712392122,1585472472718316963,1585472473609289831,1585472475605575895,1585472476612965340,1585472476627833657,1585472476646062949,1585472476652654610,1585472476668315503,1585472476675516744,1585472478606794881,1585472480605033781,1585472480607731797,1585472481618767138,1585472481621102625,1585472481622898291,1585472481629257037,1585472482610426325,1585472482616467972,1585472485607527987,1585472490605553069,1585472496604800577,1585472499605506245,1585472499620957128,1585472499626671228,1585472500633670742,1585472500635156166,1585472500679458795,1585472500683359200,1585472504605295266,1585472504621076251,1585472504657056392,1585472514609430928,1585472521621046276,1585472523607860653,1585472524604765306,1585472525608506279,1585472525614910794,1585472525661900238,1585472525678788976],"NumGC":625,"NumForcedGC":0,"GCCPUFraction":0.0024027226843164376,"EnableGC":true,"DebugGC":false,"BySize":[{"Size":0,"Mallocs":0,"Frees":0},{"Size":8,"Mallocs":53039,"Frees":52885},{"Size":16,"Mallocs":15708440,"Frees":15706390},{"Size":32,"Mallocs":5284379,"Frees":5283666},{"Size":48,"Mallocs":858947,"Frees":858130},{"Size":64,"Mallocs":5185184,"Frees":5184829},{"Size":80,"Mallocs":97810,"Frees":97670},{"Size":96,"Mallocs":335831,"Frees":335017},{"Size":112,"Mallocs":712316,"Frees":712243},{"Size":128,"Mallocs":5102248,"Frees":5102129},{"Size":144,"Mallocs":3998632,"Frees":3995884},{"Size":160,"Mallocs":4698392,"Frees":4621187},{"Size":176,"Mallocs":182340,"Frees":167034},{"Size":192,"Mallocs":117018,"Frees":116829},{"Size":208,"Mallocs":14011,"Frees":13953},{"Size":224,"Mallocs":4957,"Frees":4949},{"Size":240,"Mallocs":6119,"Frees":6115},{"Size":256,"Mallocs":4355718,"Frees":4355656},{"Size":288,"Mallocs":1305412,"Frees":1305282},{"Size":320,"Mallocs":9527,"Frees":9513},{"Size":352,"Mallocs":193224,"Frees":193182},{"Size":384,"Mallocs":2916,"Frees":2512},{"Size":416,"Mallocs":404,"Frees":392},{"Size":448,"Mallocs":938,"Frees":936},{"Size":480,"Mallocs":7825,"Frees":7821},{"Size":512,"Mallocs":4240157,"Frees":4240131},{"Size":576,"Mallocs":12551,"Frees":12489},{"Size":640,"Mallocs":4637,"Frees":4627},{"Size":704,"Mallocs":8204,"Frees":8185},{"Size":768,"Mallocs":297,"Frees":296},{"Size":896,"Mallocs":665,"Frees":636},{"Size":1024,"Mallocs":3857867,"Frees":3857842},{"Size":1152,"Mallocs":11818,"Frees":11805},{"Size":1280,"Mallocs":1816869,"Frees":1816860},{"Size":1408,"Mallocs":3687,"Frees":3685},{"Size":1536,"Mallocs":5505,"Frees":5504},{"Size":1792,"Mallocs":1813907,"Frees":1813898},{"Size":2048,"Mallocs":1991744,"Frees":1991737},{"Size":2304,"Mallocs":305843,"Frees":305834},{"Size":2688,"Mallocs":9352,"Frees":9348},{"Size":3072,"Mallocs":295082,"Frees":295079},{"Size":3200,"Mallocs":3,"Frees":1},{"Size":3456,"Mallocs":39,"Frees":37},{"Size":4096,"Mallocs":1897746,"Frees":1897674},{"Size":4864,"Mallocs":10117,"Frees":10110},{"Size":5376,"Mallocs":2282,"Frees":2278},{"Size":6144,"Mallocs":2398,"Frees":2395},{"Size":6528,"Mallocs":0,"Frees":0},{"Size":6784,"Mallocs":31,"Frees":31},{"Size":6912,"Mallocs":0,"Frees":0},{"Size":8192,"Mallocs":1869903,"Frees":1869899},{"Size":9472,"Mallocs":92,"Frees":84},{"Size":9728,"Mallocs":5344,"Frees":5344},{"Size":10240,"Mallocs":1866312,"Frees":1866312},{"Size":10880,"Mallocs":1800,"Frees":1800},{"Size":12288,"Mallocs":1919,"Frees":1918},{"Size":13568,"Mallocs":1866227,"Frees":1866227},{"Size":14336,"Mallocs":6,"Frees":6},{"Size":16384,"Mallocs":2296,"Frees":2296},{"Size":18432,"Mallocs":347408,"Frees":347407},{"Size":19072,"Mallocs":1,"Frees":0}]},
"render_cache_overhead_ns": 33363116,
"render_requests": 729,
"requestBuckets": [721,69,49,28,9,11,4,3,3,6,23],
"request_cache_hits": 253,
"request_cache_misses": 674,
"requests": 927,
"zipper_cache_hits": 0,
"zipper_cache_misses": 0,
"zipper_find_errors": 0,
"zipper_find_requests": 0,
"zipper_info_errors": 0,
"zipper_info_requests": 0,
"zipper_render_errors": 0,
"zipper_render_requests": 0,
"zipper_search_requests": 0,
"zipper_timeouts": 0
}

Logs showing same error as before

"type":"fetch","request":"&MultiFetchRequest{Metrics:[{host.snowflake-web.*.timers.snowflake.events.XXXX.load_lag.upper 1585244646 1585245546 false host.snowflake-web.*.timers.snowflake.events.XXXX.load_lag.upper []}],}","errors":[{"error":"unexpected EOF"}]}

Civil commented 4 years ago

Can you check latest master? There was a bug that SendGlobAsIs was not taken into account and in your case all backends got "MaxBatchSize" of 0, which is equivalent of "AlwaysSendGlobsAsIs".

So likely memory consumption is cased by that as all the replies would contain all metrics and consume much more memory for each request.

Basically that's kinda confirmed by the top from the profiler: 6630.57MB 91.57% 91.57% 6630.57MB 91.57% github.com/go-graphite/protocol/carbonapi_v2_pb.(*FetchResponse).Unmarshal

Which is allocated for actual metrics.

cjagus commented 4 years ago

@Civil looks like it is because of AlwaysSendGlobsAsIs,

I have few Questions regarding configs and recommended settings?.

So we are running our Graphite cluster in AWS instance store machines now [carbon-c-relay -> go-carbon-replica1 [around 8 instance], go-carbon-replica2 [around 8 instance] - > carbonapi [Queries from both groups]

Civil commented 4 years ago

Do you recommend enabling AlwaysSendGlobsAsIs if we have enough ram or if we can reduce the number of metrics per server ?.

It depends. This option is especially important if you use graphite-clickhouse or prometheus. For go-carbon it used to be not that important it was better to set large enough maxBatchSize. However one use-case that might benefit from that is caching the queries, but see below.

Also which one is faster, older upstream backends or backendv2? [should we start using that?]

with backendv2 you should be able to specify newer protocol (carbonapi_v3_pb), it should be a bit faster and consume less memory on carbonapi side. However I'm not sure if it's well enough tested. Reason for that is that the only backend that supports it is go-carbon, however for bacekends that do not support tags, people tend to use booking.com's fork of carbonapi which doesn't support carbonapi_v3_pb.

backendsv2 also provide much more flexibility in configuring retries, timeouts or having more interesting topology. Main reason was to allow not only to broadcast all requests to all backends, but to also do at least round-robin (again, that's useful for databases that take care of replication themselves, like for clickhouse).

For go-carbon - eventually I'd migrate, but because it's much more tested and old-style configuration will be eventually deprecated.

Do you recommend memcache [aws elasticache] or system memory cache?

I'd suggest to use memory cache at least for now. External memcache in it's current implementation add too much latency, especially if "maxBatchSize" is not 0. In that case it will receive many single requests and the latency to memcache itself will become a problem.

I had some plans on reworking memcache code and add support for other ways of caching, but never got into doing that properly.

Any other tips to speed up the Query loading time ?

I'd start with updating carobnapi to current master (or you can wait a bit, I'll tag a new release soon, I have only one issue that I want to solve before that). It won't improve performance (at least not significantly), but it will give you a more usable metrics to look at.

Second thing would be to identify how many time it take to fetch data from backend and how much it take to process the data.

Basically all ideas here is to identify the bottlenecks and then see what you can do with that.

But overall the really deal breaker here would be if you'll take a look at queries you do towards carbonapi and try to optimize them (maybe you can materialize some of them). I had plans to do more fancy caching on carbonapi side and more clever way to process request (e.x. not to fetch data that we have already in cache and also implement stamped prevention, which will basically do background query for new data for popular queries and cache them even before user request that), but that never got materialized while I was working at previous company, and currently I work on carbonapi at my spare time so it receive very random amount of love every now and then. Eventually I'll get into that, but I can't tell when.

Civil commented 4 years ago

As it seems that this is currently solved, closing.