flomesh-io / pipy

Pipy is a programmable proxy for the cloud, edge and IoT.
https://flomesh.io/pipy
Other
743 stars 70 forks source link

duplicated builtin metric in pipy repo aggregated result #116

Closed ethinx closed 1 year ago

ethinx commented 1 year ago

What happened

builtin metric pipy_outbound_count is duplicated in the pipy repo aggregated result.

Reproduce the issue

1. Start pipy repo, create a codebase with this simple script ```js (( router = new algo.URLRouter({ '/*': new algo.RoundRobinLoadBalancer(['192.168.66.64:80', '192.168.66.170:80']), }), ) => pipy({ _target: undefined, _requestCounter: new stats.Counter('http_requests_count', ['method', 'status', 'host', 'path']), _reqHead: null, _resHead: null, _path: null, _method: null, _host: null, _reqTime: 0, }) .listen(18000) .demuxHTTP().to( $=>$.handleMessageStart( msg => ( _target = router.find( msg.head.headers.host, msg.head.path, )?.next?.(), _reqHead = msg.head, _path = (new URL(msg.head.path)).pathname, _method = msg.head.method, _host = msg.head.headers.host, _reqTime = Date.now() ) ) .branch( () => Boolean(_target), ( $=>$.muxHTTP(() => _target).to( $=>$.connect(() => _target.id) ) ), ( $=>$.replaceMessage( new Message({ status: 404 }, 'No route') ) ) ) .handleMessageStart( msg => ( _resHead = msg.head, _resHead && _requestCounter.withLabels(_method, _resHead.status, _host, _path).increase() ) ) ) )() ```
2. Start multiple pipy instance and subscribe to the codebase, maybe one instance could reproduce the issue too. ```bash #!/bin/bash ulimit -SHn 655360 export REPO=http://192.168.66.1:6060/repo/test-metrics/ for i in `seq 1 4` do export PIPY_NAME=01-$i if [ ! -f uuid.$i ] then uuidgen > uuid.$i fi nohup pipy --reuse-port --instance-uuid=$(cat uuid.$i) --instance-name=$PIPY_NAME $REPO 2>&1 > /dev/null & done ```
3. Start load generator and make some requests to proxy, 3-5 seconds may be long enough for the pipy instance to create multiple connections to the upstream. Stop the loads and wait until the connections to the upstreams closed (check with `ss -tn`), then starting a new round of requests. hmm... I use k6 with the test script, but I thinks it's not important: $ ./k6 run metric.js -u 10 -d 5s ```js import http from 'k6/http'; import { check } from 'k6'; import { randomItem } from 'https://jslib.k6.io/k6-utils/1.2.0/index.js'; export const options = { summaryTrendStats: ['avg', 'min', 'med', 'max', 'p(95)', 'p(99)', 'p(99.9)', 'p(99.99)', 'count'], noConnectionReuse: false, insecureSkipTLSVerify: true, }; const first_name = [ "admiring", "adoring", ]; const last_name = [ "kalam", "kapitsa", ]; const ids = [ "", ] const targets = [ { name: 'hello', url: 'http://192.168.66.209:18000/', headers: {}, checker: { 'hello check': (r) => r.body.includes('Hi, there'), }, }, ]; export default function() { let target = targets[0]; let resp = http.get(target.url + randomItem(first_name) + '-' + randomItem(last_name) + randomItem(ids), { headers: target.headers, }); check(resp, target.checker); } ```
4. you could see there are duplicated `pipy_outbound_count` entries on the repo/metrics page, after each round of test. ``` $ curl http://192.168.66.1:6060/metrics -s | grep -E 'outbound_count{' pipy_outbound_count{instance="01-1"} 6 pipy_outbound_count{instance="01-1",peer="[192.168.66.1]:6060"} 2 pipy_outbound_count{instance="01-1",peer="[192.168.66.64]:80"} 2 pipy_outbound_count{instance="01-1",peer="[192.168.66.170]:80"} 1 pipy_outbound_count{instance="01-1",peer="[192.168.66.170]:80"} 2 pipy_outbound_count{instance="01-1",peer="[192.168.66.64]:80"} 2 pipy_outbound_count{instance="01-1",peer="[192.168.66.170]:80"} 2 pipy_outbound_count{instance="01-1",peer="[192.168.66.64]:80"} 2 pipy_outbound_count{instance="01-3"} 2 pipy_outbound_count{instance="01-3",peer="[192.168.66.1]:6060"} 1 pipy_outbound_count{instance="01-3",peer="[192.168.66.64]:80"} 2 pipy_outbound_count{instance="01-3",peer="[192.168.66.170]:80"} 1 pipy_outbound_count{instance="01-3",peer="[192.168.66.170]:80"} 1 pipy_outbound_count{instance="01-3",peer="[192.168.66.170]:80"} 1 pipy_outbound_count{instance="01-2"} 4 pipy_outbound_count{instance="01-2",peer="[192.168.66.1]:6060"} 1 pipy_outbound_count{instance="01-2",peer="[192.168.66.64]:80"} 1 pipy_outbound_count{instance="01-2",peer="[192.168.66.170]:80"} 2 pipy_outbound_count{instance="01-2",peer="[192.168.66.64]:80"} 1 pipy_outbound_count{instance="01-2",peer="[192.168.66.170]:80"} 2 pipy_outbound_count{instance="01-2",peer="[192.168.66.64]:80"} 1 pipy_outbound_count{instance="01-4"} 3 pipy_outbound_count{instance="01-4",peer="[192.168.66.1]:6060"} 1 pipy_outbound_count{instance="01-4",peer="[192.168.66.64]:80"} 2 pipy_outbound_count{instance="01-4",peer="[192.168.66.170]:80"} 1 pipy_outbound_count{instance="01-4",peer="[192.168.66.170]:80"} 1 pipy_outbound_count{instance="01-4",peer="[192.168.66.64]:80"} 1 pipy_outbound_count{instance="01-4",peer="[192.168.66.170]:80"} 1 pipy_outbound_count{instance="01-4",peer="[192.168.66.64]:80"} 1 ```

Impact

Large set of pipy instances and upstream connections, may cause the metric explosion over time.

Expect behavior

There should not be any duplcated metric.

Version info

pipy repo: 0.50.0-88

pajama-coder commented 1 year ago

Seems that the numbers are good on the worker node. They are only duplicated on the repo side. Might be a bug in the compact format being used while sending metrics to the repo, which is actually a stateful protocol. I'll give it a further check later on.

ethinx commented 1 year ago

The issue can't be reproduced now