consensus-shipyard / ipc

🌳 Spawn multi-level trees of customized, scalable, EVM-compatible networks with IPC. L2++ powered by FVM, Wasm, libp2p, IPFS/IPLD, and CometBFT.
https://ipc.space
Apache License 2.0
44 stars 39 forks source link

fix: metrics validator label to use address instead of pk struct formatting #1201

Closed LePremierHomme closed 1 week ago

LePremierHomme commented 1 week ago

Closes https://github.com/consensus-shipyard/ipc/issues/1190

I used address instead of the public key hex encoding. Can change that if wanted.

raulk commented 1 week ago

@LePremierHomme nit: the "closing" keyword is not supported by GitHub to link the issue, but "closes", "close", "closed" are. https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

LePremierHomme commented 1 week ago
  1. We should make record_metrics() fallible by returning a Result<()>. Then we can avoid having packing code, and the higher level component would log errors in the error event type. @karlem could you drop a link to an example where we use that event type?

Not sure I understand the purpose of having it fallible. To propagate encoding errors?

  1. Would add a TODO so we try to propagate the FVM address in the validator context.

The public key is what's used as the validator key for comparisons. I guess it would be better to just derive Address as a view, or fix it as an additional field in the context. It seems to be currently needed just for the metrics, so I could also just change CheckpointSigned to contain the address instead of the public key.

karlem commented 1 week ago

We should make record_metrics() fallible by returning a Result<()>. This way, we can avoid having packing code, and the higher-level component would log errors using the error event type. @karlem, could you provide a link to an example where we use this event type?

In general, I agree—failing to record a metric should not cause a panic that shuts down the whole node. Here's an example of how to emit a tracing error: https://github.com/consensus-shipyard/ipc/blob/409c1387ddf978a76c999bd8a19c7291dd094fd1/fendermint/vm/interpreter/src/fvm/exec.rs#L200.

However, given point 2:

Would add a TODO so we try to propagate the FVM address in the validator context.

I would suggest we include this change in the current PR. It’s a simple modification—see here: https://github.com/consensus-shipyard/ipc/blob/409c1387ddf978a76c999bd8a19c7291dd094fd1/fendermint/app/src/cmd/run.rs#L128.

By including the address in the event directly, we can skip the conversion altogether and eliminate the need to make record_metrics() fallible. At least for now!

karlem commented 1 week ago

@LePremierHomme , the address is already being converted in run.rs. I suggest including it in the ValidatorContext and then using it directly in the event.

raulk commented 1 week ago

SGTM!

raulk commented 1 week ago

Thank you, @karlem! 🙏

LePremierHomme commented 1 week ago

Changed ValidatorContext to hold the to_address conversion made during app start.

As for making record_metrics() fallible: none of the current implementations contain fallible code, and I think it’s unlikely that they will. Any thoughts on that? @karlem @raulk