Open BobClaerhout opened 1 year ago
I would argue that there might be different reasons for why a message cannot be processed. The message might not adhere to the supported/expected syntax or there might be a problem while processing the message due to e.g. a failing back end service. In both cases the messages are unprocessable
. So, what we are talking about here seems to be having a way to also indicate the reason why a message could not be processed.
Having a fourth status unknown does not really serve this purpose FMPOV as it seems to suggest that the message does not fall into the unprocessable
category (which it still does). Also, changing the semantics of the unprocessable
status in this way may lead to existing alarms failing to detect this situation, if the corresponding messages no longer are being tagged as unprocessable
.
Another less intrusive solution might be to add another tag for indicating the cause explicitly. This tag could then have values like bad-syntax
, unsupported
etc. while the status tag would still have value unprocessable
WDYT?
I agree with your reasoning.
I would suggest reason
to be the additional tag.
Furthermore, I would suggest following values:
bad-syntax
: json could not be convertedunknown-type
: type of the message could not be deductedunsupported-type
: type of the message is not supported for uplinkunknown
: reason could not be deducted from the errorAdded the enum to the PR, see complete list for reasons in the enum: https://github.com/eclipse-hono/hono/pull/3519/files#diff-5ffa4da77dce1cb7ee4432311e678c5ac69f347d29b5d12d9afb029fbb03f161R252
We should also consider whether we only want to introduce the reason tag to the metrics reported by the Lora adapter or if this is relevant to all adapters. IMHO this is mostly relevant to the Lora adapter at the moment because it is the only adapter that actually inspects/parses the message payload. All other adapters simply forward the payload as-is.
WDYT? @calohmn any opinion on this?
The implementation in #3519 already adds the reason
tag for reportTelemetry
invocations of other adapters as well, and I think that makes sense. I think it would also be useful to extend the reportCommand
method accordingly (in a separate PR).
@BobClaerhout I still do not like the idea of putting concrete reasons for (adapter specific) failures into the general metrics tags. All of the information you want to put into the reason tag is available from the traces. Many of the reasons seem to have meaning only in the context of the Lora adapter (bad-syntax, unknown-type, unsupported-type) because Hono itself is payload agnostic while for the others I have a hard time seeing why I would need to count their occurrences. Is this because you want to detect during runtime that some devices have not properly been implemented?
The metric
hono_telemetry_payload_bytes_count
contains astatus
field. This field is eitherforwarded
,unprocessable
orundeliverable
. We added prometheus alerts to follow up unprocessable messages in the lora adapter. However, in the lora adapter we are getting statusunprocessable
when the LNS is providing a message that is not an uplink message. This is completely different from an actual uplink message that is unprocessable because of any other unexpected exception. For this we would like to add an additional status:unknown
, indicating that the message we received is an unknown message and thus not implemented which is fundamentally different from unrpocessable messages.