Consensys / linea-arithmetization

19 stars 19 forks source link

/linea-trace/health and /linea-trace/ready endpoints #768

Closed non-fungible-nelson closed 3 weeks ago

non-fungible-nelson commented 1 month ago

Description

The plug-in is intended to serve live requests to its filesystem and its functionality. It will be replicated and used in K8s across the architecture. To this end, a /health liveness endpoint is likely required and it will need to be poll-able, i.e. not JSON-RPC. A /ready endpoint should also be created when the node is ready to serve traffic, i.e. start-up complete, healthy, ready to serve traffic of:

The endpoints should be compliant with K8S HTTP probe spec. See ref

Acceptance Criteria

Filter94 commented 1 month ago

We don't really need /health endpoint at the moment, because we're using custom healthchecks based on block height AFAIK. But it's fine if you want to introduce it

macfarla commented 1 month ago

Think we can use the existing /liveness and /readiness endpoints in besu

Running a besu node that has zero peers

➜  ~ curl -v 'http://localhost:8545/liveness'
* Host localhost:8545 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8545...
* Connected to localhost (::1) port 8545
> GET /liveness HTTP/1.1
> Host: localhost:8545
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< vary: origin
< content-length: 21
<
{
  "status" : "UP"
* Connection #0 to host localhost left intact
}%
➜  ~ curl -v 'http://localhost:8545/readiness'
* Host localhost:8545 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8545...
* Connected to localhost (::1) port 8545
> GET /readiness HTTP/1.1
> Host: localhost:8545
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< vary: origin
< content-length: 23
<
{
  "status" : "DOWN"
* Connection #0 to host localhost left intact
}%
➜  ~ curl -v 'http://localhost:8545/readiness?minPeers=0'
* Host localhost:8545 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8545...
* Connected to localhost (::1) port 8545
> GET /readiness?minPeers=0 HTTP/1.1
> Host: localhost:8545
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< vary: origin
< content-length: 21
<
{
  "status" : "UP"
* Connection #0 to host localhost left intact
}%

Node that is still syncing

➜  besu-local-nodes git:(multi-tenancy) ✗ curl -v 'http://localhost:8545/liveness'
* Host localhost:8545 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8545...
* Connected to localhost (::1) port 8545
> GET /liveness HTTP/1.1
> Host: localhost:8545
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< vary: origin
< content-length: 21
<
{
  "status" : "UP"
* Connection #0 to host localhost left intact
}%
➜  besu-local-nodes git:(multi-tenancy) ✗ curl -v 'http://localhost:8545/readiness?minPeers=0'
* Host localhost:8545 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8545...
* Connected to localhost (::1) port 8545
> GET /readiness?minPeers=0 HTTP/1.1
> Host: localhost:8545
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< vary: origin
< content-length: 23
<
{
  "status" : "DOWN"
* Connection #0 to host localhost left intact
}% 
macfarla commented 1 month ago

Think we can use the existing /readiness and /liveness endpoints that already exist in Besu - /liveness has the same meaning - is the JSON-RPC server up https://besu.hyperledger.org/public-networks/how-to/use-besu-api/json-rpc#readiness /readiness is a bit flexible - by default requires one connected peer and the node to be within two blocks of the best known block - and for this to be true, the node must be finished syncing (or within 2 blocks of head). And for that to be true, it's completed loading any plugins. But you can also supply parameters for minPeers and maxBlocksBehind https://besu.hyperledger.org/public-networks/how-to/use-besu-api/json-rpc#readiness

macfarla commented 3 weeks ago

Update from Tsvetan 20 June

we narrowed it down to a metric that will represent node load in a more robust way, so that the round robin strategy of the LB can be directed in a way that none of the nodes could be overloaded and others sitting idle. I will discuss possible options with the rest of the Arithmetization team if we can improve on the naive solution of just using a threshold for conflation or counting requests. One possible option would be to utilize the gas projection that we calculate and sum up the used gas for each request, but still we will discuss the options and I will propose a solution.

On this basis, closing this ticket since it is does not describe the solution to the problem.