filecoin-project / sentinel-drone

An analytics capture agent for lotus daemon which is forked from Telegraf (https://github.com/influxdata/telegraf) to support custom input/output plugins.
MIT License
6 stars 4 forks source link

Remove Mpool Tracking; Adjust Metrics Names #34

Closed placer14 closed 4 years ago

placer14 commented 4 years ago

Mpool tracking is too expensive. Lotus input metric names start with observed_.

Concerns for discussion:

alanshaw commented 4 years ago

Mpool tracking is too expensive.

In the project demo you mentioned sentinel could derive how long a message was sitting in the mempool. I thought that might be quite a useful node/network health indicator to know in aggregate. Can it still be done with this removal?

placer14 commented 4 years ago

No, this removal leaves that requirement hanging in the wind. Still, I think this is the right approach. Drone intends to be deployed to multiple nodes and I don't think we want to store the whole mempool of all the nodes. Instead, lotus-shed has https://github.com/filecoin-project/lotus/blob/master/cmd/lotus-shed/mempool-stats.go which would provide the aggregate for us. Unfortunately, it is not compatible with how we push metrics into the DB. Some thought needs to be put in here.

placer14 commented 4 years ago

Making this PR a draft for discussion. Could also push this change into a next branch so we don't have to ship it right away.

placer14 commented 4 years ago

In the project demo you mentioned sentinel could derive how long a message was sitting in the mempool. I thought that might be quite a useful node/network health indicator to know in aggregate. Can it still be done with this removal?

@alanshaw Re above: I think the way forward is to include the lotus-shed mpool aggregator into lotus and expose as a prometheus metric from there. This feature could be opt-in to appease those concerned about operational overhead.