flomesh-io / pipy

Pipy is a programmable proxy for the cloud, edge and IoT.
https://flomesh.io/pipy
Other
743 stars 70 forks source link

metrics label is not well handled in multi-threading version #141

Closed ethinx closed 1 year ago

ethinx commented 1 year ago

What happened

Use samples/gateway for testing, enable metrics plugin as following

{
  "listen": 8000,
  "listenTLS": 8443,
  "plugins": [
    "plugins/router.js",
    "plugins/metrics.js",
    "plugins/jwt.js",
    "plugins/cache.js",
    "plugins/hello.js",
    "plugins/balancer.js",
    "plugins/serve-files.js",
    "plugins/default.js"
  ],
  ...
}

The label of custom metrics(request_*, response_*) are incorrect in repo.

The label of custom metrics on the repo as following:

$ curl localhost:6060/metrics -s | grep -Ei 'request_|response_'
request_count{instance="pipy1-1"} 0
request_count{instance="pipy1-1",request_count="private"} 4
request_count{instance="pipy1-1",request_count="api"} 4
request_count{instance="pipy1-1",request_count="home"} 2
response_status{instance="pipy1-1"} 0
response_status{instance="pipy1-1",response_status="private"} 0
response_status{instance="pipy1-1",response_status="api"} 0
response_status{instance="pipy1-1",response_status="home"} 0
request_latency{instance="pipy1-1"} 0
request_latency{instance="pipy1-1",request_latency="private"} 0
request_latency{instance="pipy1-1",request_latency="api"} 0
request_latency{instance="pipy1-1",request_latency="home"} 0

Moreover, I try to get the metrics from proxy worker several times, the first label of the metrics may be lost ({,).

$ curl localhost:6060/metrics -s | grep -Ei 'request_|response_'
request_count 0
request_count{route="private"} 4
request_count{route="api"} 4
request_count{route="home"} 2
request_latency_bucket 0
request_latency_count 0
request_latency_sum 0
request_latency_bucket{1="private",le=""NaN""} 0
request_latency_count{1="private"} 4
request_latency_sum{1="private"} 0
request_latency_bucket{1="api",le=""NaN""} 0
request_latency_count{1="api"} 4
request_latency_sum{1="api"} 2
request_latency_bucket{1="home",le=""NaN""} 0
request_latency_count{1="home"} 2
request_latency_sum{1="home"} 0
response_status 0
response_status{route="private"} 0
response_status{route="private",status="200"} 4
response_status{route="api"} 0
response_status{route="api",status="200"} 4
response_status{route="home"} 0
response_status{route="home",status="200"} 2
$ curl localhost:6060/metrics -s | grep -Ei 'request_|response_'
request_count{} 0
request_count{,route="private"} 4
request_count{,route="api"} 4
request_count{,route="home"} 2
request_latency_bucket{,le=""NaN""} 0
request_latency_count{} 0
request_latency_sum{} 0
request_latency_bucket{,1="private",le=""NaN""} 0
request_latency_count{,1="private"} 4
request_latency_sum{,1="private"} 0
request_latency_bucket{,1="api",le=""NaN""} 0
request_latency_count{,1="api"} 4
request_latency_sum{,1="api"} 2
request_latency_bucket{,1="home",le=""NaN""} 0
request_latency_count{,1="home"} 2
request_latency_sum{,1="home"} 0
response_status{} 0
response_status{,route="private"} 0
response_status{,route="private",status="200"} 4
response_status{,route="api"} 0
response_status{,route="api",status="200"} 4
response_status{,route="home"} 0
response_status{,route="home",status="200"} 2

There is no problem on main branch.

How to reproduce

As aforementioned.

Expect behavior

labels are consistent in pjs and the final result.

Version info

Version     : nightly-202301041229
Commit      : 55c9e5d546f1e415688decfee7d823983577dd4c
Commit Date : Wed, 4 Jan 2023 11:37:10 +0800
Host        : Linux-5.15.0-39-generic x86_64
OpenSSL     : OpenSSL 1.1.1q  5 Jul 2022
Builtin GUI : No
Samples     : No
pajama-coder commented 1 year ago

@ethinx I've fixed the problem for the repo, but couldn't reproduce the problem for the worker. Could you please test it against the latest commit and see if the worker problem still exists?

ethinx commented 1 year ago

repo and worker look good now.