LoRaWAN unprocessable messages also contain unknown messages

BobClaerhout commented 1 year ago

The metric hono_telemetry_payload_bytes_count contains a status field. This field is either forwarded, unprocessable or undeliverable. We added prometheus alerts to follow up unprocessable messages in the lora adapter. However, in the lora adapter we are getting status unprocessable when the LNS is providing a message that is not an uplink message. This is completely different from an actual uplink message that is unprocessable because of any other unexpected exception. For this we would like to add an additional status: unknown, indicating that the message we received is an unknown message and thus not implemented which is fundamentally different from unrpocessable messages.

sophokles73 commented 1 year ago

I would argue that there might be different reasons for why a message cannot be processed. The message might not adhere to the supported/expected syntax or there might be a problem while processing the message due to e.g. a failing back end service. In both cases the messages are unprocessable. So, what we are talking about here seems to be having a way to also indicate the reason why a message could not be processed.

Having a fourth status unknown does not really serve this purpose FMPOV as it seems to suggest that the message does not fall into the unprocessable category (which it still does). Also, changing the semantics of the unprocessable status in this way may lead to existing alarms failing to detect this situation, if the corresponding messages no longer are being tagged as unprocessable.

Another less intrusive solution might be to add another tag for indicating the cause explicitly. This tag could then have values like bad-syntax, unsupported etc. while the status tag would still have value unprocessable

WDYT?

BobClaerhout commented 1 year ago

I agree with your reasoning.

I would suggest reason to be the additional tag. Furthermore, I would suggest following values:

bad-syntax: json could not be converted
unknown-type: type of the message could not be deducted
unsupported-type: type of the message is not supported for uplink
unknown: reason could not be deducted from the error

BobClaerhout commented 1 year ago

Added the enum to the PR, see complete list for reasons in the enum: https://github.com/eclipse-hono/hono/pull/3519/files#diff-5ffa4da77dce1cb7ee4432311e678c5ac69f347d29b5d12d9afb029fbb03f161R252

sophokles73 commented 1 year ago

We should also consider whether we only want to introduce the reason tag to the metrics reported by the Lora adapter or if this is relevant to all adapters. IMHO this is mostly relevant to the Lora adapter at the moment because it is the only adapter that actually inspects/parses the message payload. All other adapters simply forward the payload as-is.

WDYT? @calohmn any opinion on this?

calohmn commented 1 year ago

The implementation in #3519 already adds the reason tag for reportTelemetry invocations of other adapters as well, and I think that makes sense. I think it would also be useful to extend the reportCommand method accordingly (in a separate PR).

sophokles73 commented 1 year ago

@BobClaerhout I still do not like the idea of putting concrete reasons for (adapter specific) failures into the general metrics tags. All of the information you want to put into the reason tag is available from the traces. Many of the reasons seem to have meaning only in the context of the Lora adapter (bad-syntax, unknown-type, unsupported-type) because Hono itself is payload agnostic while for the others I have a hard time seeing why I would need to count their occurrences. Is this because you want to detect during runtime that some devices have not properly been implemented?

eclipse-hono / hono

LoRaWAN unprocessable messages also contain unknown messages #3520