cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.26k stars 3.62k forks source link

[Bug]: lost telemetry after upgrade to v0.50.7 #20992

Closed scirner22 closed 1 month ago

scirner22 commented 3 months ago

Is there an existing issue for this?

What happened?

After upgrading from v0.46.13 v0.50.7 we seem to have lost some cosmos layer telemetry.

Configuration

# config.toml
[instrumentation]
prometheus = true
prometheus_listen_addr = ":26660"
max_open_connections = 3
namespace = "cometbft"

# app.toml
[telemetry]
service-name = ""
enabled = true
enable-hostname = true
enable-hostname-label = true
enable-service-label = false
prometheus-retention-time = 60
global-labels = [
  [
    "chain_id",
    "pio-mainnet-1"
  ]
]

Metrics before upgrade

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75520    0 75520    0     0   363k      0 --:--:-- --:--:-- --:--:--  365k
# HELP store_iavl_commit store_iavl_commit
# TYPE store_iavl_commit summary
store_iavl_commit{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.5"} 0.37117999792099
store_iavl_commit{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.9"} 0.6181100010871887
store_iavl_commit{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.99"} 2.5882089138031006
store_iavl_commit_sum{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1"} 258902.27967266366
store_iavl_commit_count{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1"} 677424
# HELP store_iavl_delete store_iavl_delete
# TYPE store_iavl_delete summary
store_iavl_delete{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.5"} 0.02745000086724758
store_iavl_delete{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.9"} 0.029260000213980675
store_iavl_delete{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.99"} 0.029260000213980675
store_iavl_delete_sum{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1"} 72579.197542749
store_iavl_delete_count{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1"} 76053
# HELP store_iavl_get store_iavl_get
# TYPE store_iavl_get summary
store_iavl_get{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.5"} 0.002409999957308173
store_iavl_get{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.9"} 0.0037499999161809683
store_iavl_get{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1",quantile="0.99"} 0.012059999629855156
store_iavl_get_sum{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1"} 2.6048887388131223e+06
store_iavl_get_count{chain_id="pio-mainnet-1",host="pio-mainnet-indexed-archived-fullblock-1"} 3.09507892e+08

Metrics after upgrade

curl http://localhost:26660/metrics  | rg iavl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103k    0  103k    0     0   533k      0 --:--:-- --:--:-- --:--:--  534k

Cosmos SDK Version

v0.50.7

How to reproduce?

No response

julienrbrt commented 3 months ago

Do you have some step to reproduce or some code we could look at?

scirner22 commented 3 months ago

I'm not totally plugged into the cosmos ecosystem. Is there a reference chain implementation of cosmos-sdk where I can try to verify this across cosmos-sdk versions?

I can provide steps for how to do this on the provenance chain, but it will require more manual steps.

julienrbrt commented 3 months ago

Yes, the reference is simapp in the release/v0.50.x branch: https://github.com/cosmos/cosmos-sdk/tree/release/v0.50.x/simapp

You can install it and configure it with: make install && make init-simapp

scirner22 commented 1 month ago

I was able to confirm these metrics were lost in simd as well. Here are the steps to reproduce.

v0.46.13

git checkout v0.46.13
make clean && make build
./build/simd testnet init-files --chain-id=testing --output-dir="./testnet" --keyring-backend=test --minimum-gas-prices=0.000001stake --v 1
# set ./testnet/node0/simd/config/config.toml
[instrumentation]
prometheus = true
# set ./testnet/node0/simd/config/app.toml
[telemetry]
enabled = true
prometheus-retention-time = 60
# end file
./build/simd start --log_level=info --home ./testnet/node0/simd
curl http://localhost:26660/metrics | rg iavl
# finds metrics!

v0.50.7

git checkout v0.50.7
make clean && make build
rm -rf ./testnet
./build/simd testnet init-files --chain-id=testing --output-dir="./testnet" --keyring-backend=test --minimum-gas-prices=0.000001stake --v 1
# set ./testnet/node0/simd/config/config.toml
[instrumentation]
prometheus = true
# set ./testnet/node0/simd/config/app.toml
[telemetry]
enabled = true
prometheus-retention-time = 60
# end file
./build/simd start --log_level=info --home ./testnet/node0/simd
curl http://localhost:26660/metrics | rg iavl
# missing metrics!
scirner22 commented 1 month ago

I confirmed these metrics are missing on the main branch as well.

julienrbrt commented 1 month ago

Thank you, I'll investigate!

julienrbrt commented 1 month ago

Found the culprit https://github.com/cosmos/cosmos-sdk/blob/main/baseapp/baseapp.go#L205 Metrics for store needs to be set using SetStoreMetrics on baseapp.

SpicyLemon commented 4 weeks ago

Did this get fixed? I can't find a PR for it.

julienrbrt commented 4 weeks ago

There is no fix needed, this needs to be called: https://github.com/cosmos/cosmos-sdk/blob/main/baseapp/options.go#L395-L402