OctopusDeploy / Halibut

| Public | A secure communication stack for .NET using JSON-RPC over SSL.
Other
12 stars 44 forks source link

Changing RPC Observer to include all of the RPC #581

Closed sburmanoctopus closed 9 months ago

sburmanoctopus commented 9 months ago

[sc-64938]

Background

On the Instance Dashboard we noticed that for some instances, despite there being many RPC failures, there were no recorded metrics for octopus_tentacle_halibut_RPC_active_calls: image

Results

Before

Halibut considered "active RPC calls" to be the actual RPC portion of the communication only.

I.e., the point at which we are performing the call itself, excluding connecting etc.

This is why we never saw anything in octopus_tentacle_halibut_RPC_active_calls when the Tentacle did not exist, as it was failing during the "connection" phase, and therefore never got to the section where it was performing the call itself.

For example, here is an extract of the metrics after a failed health check, where the Tentacle was turned off. Note the lack of octopus_tentacle_halibut_RPC_active_calls: image

After

We now consider "active RPC calls" to be the amount of time for Halibut to "perform the entire remote call process". This will include the time it takes to connect.

In the case of a polling tentacle, this will include the amount of time it takes to queue the item, wait for it to be dequeued, processed, and a result set (or a timeout).

Now, after a failed health check, we will see a metric recorded for octopus_tentacle_halibut_RPC_active_calls: image

During a failed attempt, note that the count will be 1, as this call is now "active": image

How to review this PR

Quality :heavy_check_mark:

Pre-requisites

shortcut-integration[bot] commented 9 months ago

This pull request has been linked to Shortcut Story #64938: Max concurrent RPC Calls metrics seems to exclude failed RPC calls.