Note the numbers don't add up because we had a big month for BwdServer due to an anomaly.
To address this:
[ ] use TraceRatio samplers one each service (20% for BwdServer, 20% for QW, 100% for others)
[x] write code
[x] merge to dark repo
[x] backport to classic-dark repo
[x] merge & deploy
[x] add flags to LaunchDarkly.
[x] add flags
[x] BwdServer
[x] Queueworker
[ ] check it works
[x] Reduce plan
[x] use honeycomb sampling for garbagecollector (5% should be fine, I'd be surprised if we ever look at this again)
[x] merge change
[x] check it worked
[x] disable k8s metrics (we get this from google cloud anyway)
[x] merge change
[x] check it worked
Overall, this should reduce us from 1.8B in march to:
BwdServer: 121M
QueueWorker: 71M
ApiServer: 67M
CronChecker: 39M
kubernetes-bwd-ocaml other: 6M
garbagecollector: 18M
Our OpenTelemetry provider is putting their prices up, so we should reduce how much we use.
Currently, we're using about 1.2B events and the next lowest threshold is 450M.
They are currently split:
cloudsql-proxy 0.11% kubernetes-bwd-nginx 0.15% kubernetes-bwd-ocaml 57.03% (1.13B) kubernetes-garbagecollector 38.02% (376M) kubernetes-metrics 4.69% (45M)
Among kubernetes-bwd-ocaml, they are split:
BwdServer | 608,015,209 QueueWorker | 354,919,048 ApiServer | 66,742,393 CronChecker | 38,742,278 other | 5,528,954
Note the numbers don't add up because we had a big month for BwdServer due to an anomaly.
To address this:
Overall, this should reduce us from 1.8B in march to: BwdServer: 121M QueueWorker: 71M ApiServer: 67M CronChecker: 39M kubernetes-bwd-ocaml other: 6M garbagecollector: 18M
Overall around 350M