Black-box monitoring: Price oracle

EmerisHQ / demeris-backend

Monorepo containing all the Demeris backend code and infrastructure definitions.

GNU Affero General Public License v3.0

8 stars 1 forks source link

Black-box monitoring: Price oracle #190

Open sgerogia opened 3 years ago

sgerogia commented 3 years ago

It is hard for the Price Oracle to self-verify the accuracy of its ticks. Create an external service which will act as a "canary" for stale ticks.

DoD

Sampling a subset of ticks at regular intervals
Calls Oracle and data sources, applies the same averaging method and compares values.
Export Prometheus metric and/or log on result
Dashboard monitoring the operation, raising alerts to the Emeris backend team

DeshErBojhaa commented 2 years ago

Using opencensus for collecting service metrics (https://github.com/census-instrumentation/opencensus-go)

ℹ️ Please add/suggest more metrics that you think is important for a good monitoring system.

Scope for MVP

[ ] Sanity check of price oracle data. Treat price-oracle as blackbox and match the data with external data provider(s) (Need more discussion)
[ ] Health check for services (will come from the heartbeat)
[ ] Average Response time
- [ ] Application Metrics
- [ ] Time per rest request (sampling at some %)
- [ ] Number of request at an interval
  - Min count
  - Max count
  - Average count
- [ ] Platform Metrics
- [ ] Average response time of data provider(s)
- [ ] Average response time of database

If we have time, and if it's a low hanging fruit, monitor the metrics of the pod (memory, cpu percentage, network etc.)

DeshErBojhaa commented 2 years ago

Blackbox Monitoring of Price Oracle

A python script that will be deployed and managed separately than any infra structure involved in price oracle.
Algorithm:
1. Every 60 seconds query binance for price of a random whitelisted token.
2. Query Price oracle with exactly same token.
3. If the price is diverged by 5% or more: a. Query binance and coingecko with that same token, b. Average the price received from these 2 sources c. If price is still diverged by more than 5%, signal error.

Host	Pros	Cons
AWS Lambda	1. Low maintenance, 2. Free tier available, 3. SLA by cloud provider, 4. Triggering alerts are easy via cloud watch	1. One extra infra to manage
Our Kubernetes	1. It's our, so have builtin monitoring, 2. One less infra to manage, We can use same grafana dashboard as price oracle for monitoring this service	1. It's our infra, it it's not being 100% blackbox

tbruyelle commented 2 years ago

If we deploy a new pod responsible for this task, are we sure that it will go through nginx (to detect cache problems) and not connect directly to the price-oracle pod ?

DeshErBojhaa commented 2 years ago

If we deploy a new pod ... are we sure that it will go through nginx (to detect cache problems) ...

Not sure. But since we will query the endpoint exposed by the service, it theoretically should go through nginx. But again not sure.