TheThingsNetwork / lorawan-stack

The Things Stack, an Open Source LoRaWAN Network Server
https://www.thethingsindustries.com/stack/
Apache License 2.0
994 stars 309 forks source link

Support for `normalized_payload` field #5429

Closed pablojimpas closed 2 years ago

pablojimpas commented 2 years ago

Summary

The new field normalized_payload will allow integrations to rely on a standardized data model for all devices in the device repository that support this new normalization step, regardless of the manufacturer, the units, and the wording they use, etc.

Why do we need this?

This feature request has already been largely discussed in https://github.com/TheThingsNetwork/lorawan-devices/issues/395

However, defining the data model right it's a different issue and does not block the implementation of this one (https://github.com/TheThingsNetwork/lorawan-devices/issues/395#issuecomment-1115023964), so this can be done before getting “the perfect standardized format”, which is a difficult task to get right, specially for less common quantities.

The implementation in lorawan-stack also does not depend on each specific device supporting the new normalizeUplink(data) function. This step should be completely optional (though encouraged for manufacturers).

What is already there? What do you see now?

There is no support for a standard data format.

What is missing? What do you want to see?

Support for a new field in uplinks with a normalized format.

How do you propose to implement this?

Has already been described in the previous referenced issue. I can't think of the specifics of lorawan-stack itself, since I'm not familiar with the codebase.

How do you propose to test this?

I suppose unit testing is sufficient, but I'm not certain.

Can you do this yourself and submit a Pull Request?

Maybe, I'm willing to contribute to get this done as swiftly as I can, but I'll need guidance on how to proceed since I don't know the internals of the system in detail, I may miss important stuff and waste a lot of time.

Let me bring @johanstokking into discussion to see if he can assign this to someone appropriate or give me assistance to achieve this.

johanstokking commented 2 years ago

Thanks for filing the issue.

@adriansmares do you want to pick this up or should we assign @nicholaspcr?

For implementation, high level, there are two approaches:

  1. Add a flow next to DecodeUplink in PayloadEncoderDecoder (PayloadEncoderDecoderNormalizer?), related services, API (AppAs), implementations of PayloadEnocderDecoder (whether or not they support it) and Console
  2. Only support normalization in the Device Repository. So normalization becomes part of DecodeUplink. This means though that Device Repository should explicitly support an rpc GetUplinkNormalizer() because there's a fourth codec. On decode, both scripts are run

I'm a bit undecided here. For CayenneLPP we can simply normalize, almost exactly like we do now but with different field names. For the Device Repository option (2) suffices. Users using custom JavaScript codecs wouldn't be able to use this, for that we'd need (1).


Should we verify the output from the normalize uplink codec script? Like with a JSON schema? We should do it anyway as part of CI in the Device Repository. If we do JSON Schema, ideally we have a code generated validator, otherwise we'd have a parallel validator written in pure Go.


We should also be able to normalize the output of CayenneLPP, which is already normalized but in a different structure, to this new structure.

adriansmares commented 2 years ago

Add a flow next to DecodeUplink in PayloadEncoderDecoder (PayloadEncoderDecoderNormalizer?), related services, API (AppAs), implementations of PayloadEnocderDecoder (whether or not they support it) and Console

Another question that we should raise here is 'do we enforce that the formatter exists only at application level, or should it exist also at end device level ?'. The use case here being of course heterogeneous applications (i.e. with different end devices in them).

I'm a bit undecided here. For CayenneLPP we can simply normalize, almost exactly like we do now but with different field names. For the Device Repository option (2) suffices. Users using custom JavaScript codecs wouldn't be able to use this, for that we'd need (1).

I think that it makes sense to speak of regularization only for 'uncontrollable' payload formatters such as the device repository or CayenneLPP. If you control the decoder script, I see no value in running the scripts separately, as you can always do one more normalization step before return.

Should we verify the output from the normalize uplink codec script? Like with a JSON schema? We should do it anyway as part of CI in the Device Repository. If we do JSON Schema, ideally we have a code generated validator, otherwise we'd have a parallel validator written in pure Go.

I think we should definitely have a set of type definitions, otherwise everyone will just have their own funky output - an untyped object really means 'anything goes', and we cannot validate if all of the outputs are correct statically if the return type is just object.

The output of the normalization function should be a predefined type:

return new NormalizedOutput({
    payload: new SinglePayload({
        temperatures: {
            "main_sensor": new Temperature(29.0, Celsius)
        },
    })
})

This looks rather verbose but downstream it just means:

{
    "@version": "1.0-rc1",
    "single": {
        "temperatures": {
            "main_sensor": {
                "value": 29.0,
                "unit": "celsius"
            },
        }
    }
}

And all of these building blocks (NormalizedOutput/SinglePayload/Temperature etc.) should already be predefined types. I'm a strong proponent here for static typing because otherwise it would be very easy to misinterpret the schema. TTS can then take this intermediate representation (the normalized object) and turn it into something developer friendly that is stable.

Implementation wise, we already can (and do) cache scripts, so having this long type definition preamble before the normalizer is not as expensive as it may seem.


I think that the new field should be strongly typed (i.e. have a protobuf definition) instead of being an opaque Struct.

pablojimpas commented 2 years ago

Another question that we should raise here is 'do we enforce that the formatter exists only at application level, or should it exist also at end device level ?'. The use case here being of course heterogeneous applications (i.e. with different end devices in them).

I think there is a lot of value in supporting this at the device level, don't know how much complexity adds to the implementation, but it's definitely desirable since most real-world applications will have more than one type of device, having a different application for each device type seems weird.

I think that it makes sense to speak of regularization only for 'uncontrollable' payload formatters such as the device repository or CayenneLPP. If you control the decoder script, I see no value in running the scripts separately, as you can always do one more normalization step before return.

I agree that the extra normalization step is not needed in a custom decoder, if you (as an integrations developer) need to bother creating a script manually, you might as well do it normalized directly since that's what your system will expect, this normalization step is more aimed towards automated onboarding (e.g. the user scans the QR Code on the device and the system automatically detects the manufacturer, uses the right decoder and provides the data normalized).

Having strict static typing will be useful for sure.

johanstokking commented 2 years ago

Another question that we should raise here is 'do we enforce that the formatter exists only at application level, or should it exist also at end device level ?'. The use case here being of course heterogeneous applications (i.e. with different end devices in them).

You mean, it wouldn't make much sense to have an application-layer normalizer if there are various device-level decoders that produce different results? That wouldn't work indeed; they come in pairs (decoder + normalizer, where device overrides application level).

Or do you mean that application integrations either expect normalized payload or they don't?

@adriansmares If you control the decoder script, I see no value in running the scripts separately, as you can always do one more normalization step before return. @pablojimpas I agree that the extra normalization step is not needed in a custom decoder, if you (as an integrations developer) need to bother creating a script manually you might as well do it normalized directly since that's what your own system will expect

True, but I would add normalized_payload, next do decoded_payload. We can't easily change decoded_payload as that would require adaptions upstream. It also allows us to gradually add normalized payload.

For custom codecs, we can check if it complies with the normalized JSON schema, and if so, also set normalized_payload (to the same value as decoded_payload).

And all of these building blocks (NormalizedOutput/SinglePayload/Temperature etc.) should already be predefined types

Yes that is exactly the idea. See the referenced issue https://github.com/TheThingsNetwork/lorawan-devices/issues/395.

So what I meant is that we have a JSON schema that can validate the normalized output. If we only support Device Repository and CayenneLPP, we don't need validation in The Things Stack. However, if we detect compliance of custom codec output with the normalized payload, we should run the output through the JSON schema validator. The thing is that JSON schema validation at runtime is pretty expensive. So we'd need a code generated validator or a custom validator.

I think that the new field should be strongly typed (i.e. have a protobuf definition) instead of being an opaque Struct.

This is an interesting idea, but that requires bijectivity; that we can map JSON output from the normalizer codec to Protobuf, and that should marshal to the same JSON output again when sending as JSON upstream.

adriansmares commented 2 years ago

You mean, it wouldn't make much sense to have an application-layer normalizer if there are various device-level decoders that produce different results? That wouldn't work indeed; they come in pairs (decoder + normalizer, where device overrides application level).

Or do you mean that application integrations either expect normalized payload or they don't?

I'm thinking from an implementation perspective that we have a lot of overrides in our hierarchy (between applications and end devices). Having end-device level overrides brings flexibility but also some (small) complexity in the implementation.

Keep in mind the whole skip_payload_crypto (or even payload formatters) hierarchy that took a very long time to explain to users (what does a null value at end device level imply, what is the priority etc.).

With that being said, I think that keeping it consistent (i.e. allowing end device level overrides) is probably better right now than enforcing that these formatters exist only globally (at application level).

True, but I would add normalized_payload, next do decoded_payload. We can't easily change decoded_payload as that would require adaptions upstream. It also allows us to gradually add normalized payload.

For custom codecs, we can check if it complies with the normalized JSON schema, and if so, also set normalized_payload (to the same value as decoded_payload).

Given that below we mention that the schema check is expensive, I think we're better of requiring an explicit normalization script. Explicitly requiring the normalization script basically allows us to avoid adding the type definitions to the normal payload codecs.

Yes that is exactly the idea. See the referenced issue TheThingsNetwork/lorawan-devices#395.

So what I meant is that we have a JSON schema that can validate the normalized output. If we only support Device Repository and CayenneLPP, we don't need validation in The Things Stack. However, if we detect compliance of custom codec output with the normalized payload, we should run the output through the JSON schema validator. The thing is that JSON schema validation at runtime is pretty expensive. So we'd need a code generated validator or a custom validator.

I don't follow how the fact that the normalizer comes from the Device Repository ensures that its output is valid. The normalizer is a JS function, it can do and return what it wants. If I accidentally return 29.4 instead of new Temperature(29.4, Celsius) you cannot catch this statically - our VM is not parsing TypeScript.

You would always need runtime validation for the output. In this context, if we enforce that the normalizer is always a separate script, we can put the type definitions (and validation checks) only in that script. At least that way we avoid having extra overhead in the normal payload codecs.

Or should the Device Repository normalizers be written in TypeScript and when we bundle the device repository we transpile to JavaScript ?

This is an interesting idea, but that requires bijectivity; that we can map JSON output from the normalizer codec to Protobuf, and that should marshal to the same JSON output again when sending as JSON upstream.

~The oneof types would map in an ugly manner (specifically, they are wrapped in an object bearing the name of the oneof), so I think we'll have to drop this idea for now unfortunately.~

I think it would be useful to consider this, as it would allow (Go) native validation post marshalling.


The payload codecs can return objects. Should there be an option in the return object that says 'just pipe this decoded payload to the normalizer' ? Then we avoid the marshalling and network costs for the intermediate representation.

johanstokking commented 2 years ago

With that being said, I think that keeping it consistent (i.e. allowing end device level overrides) is probably better right now than enforcing that these formatters exist only globally (at application level).

Yeah. But let's first consider automatically detecting the output of a custom per-device (and per-application) codec against the normalized schema. Because if we do that, we magically set normalized_payload and there's no per-device or per-application normalizer codec necessary.

Regarding validation cost: it's not something we can fully eliminate, but on the other hand it's not much more than defining a Go struct with all the fields, try to unmarshal the JSON (that will detect wrong types), and then check field-by-field whether it falls in the ranges (min/max/patterns/enums etc).

I don't follow how the fact that the normalizer comes from the Device Repository ensures that its output is valid.

Indeed only to the extent that input and output is tested in CI (see https://github.com/TheThingsNetwork/lorawan-devices/#payload-codecs). But indeed, there are no promises on test coverage nor can we test all binary input exhaustively. So indeed, we can't rely on this (at all).

I think it would be useful to consider this, as it would allow (Go) native validation post marshalling.

Okay so how do we do this? Unmarshal JSON to a proto generated struct, then validate fields, and that generated proto struct would marshal back to the same JSON? That should work. Then we indeed have the representation in code (and proto), we'd only have two places to maintain the validation (JSON schema in Device Repository and TTS code but ok).

The payload codecs can return objects. Should there be an option in the return object that says 'just pipe this decoded payload to the normalizer' ? Then we avoid the marshalling and network costs for the intermediate representation.

We could indeed add a new normalized: true return field in the decoder, that signals TTS to expect the output to be already normalized, skipping additional normalizers.

adriansmares commented 2 years ago

Okay so how do we do this? Unmarshal JSON to a proto generated struct, then validate fields, and that generated proto struct would marshal back to the same JSON? That should work. Then we indeed have the representation in code (and proto), we'd only have two places to maintain the validation (JSON schema in Device Repository and TTS code but ok).

TLDR: Yes.

Keep in mind that internally in the Application Server you have an in-memory representation 'in the pipes' anyway, as we carry around ApplicationUp everywhere, not the serialized binary. So you need an in-memory representation anyway, and I argue that a well defined proto definition has the advantage of having serialization and validation embedded in it, and we can reference it directly in ApplicationUp. It's also more API-stable and strict than a generic *pbtypes.Struct.

Is it more work than passing around *pbtypes.Struct / map[string]interface{} everywhere ? Yes, probably. But the validation code is going to be generated, and for better or worse we're already experienced with protobuf maintenance.

We could indeed add a new normalized: true return field in the decoder, that signals TTS to expect the output to be already normalized, skipping additional normalizers.

:+1:

pablojimpas commented 2 years ago

What's exactly blocking the implementation? I thought we agree on https://github.com/TheThingsNetwork/lorawan-devices/issues/395 that the schema definition of the normalized format is decoupled from implementing this feature in The Things Stack.

johanstokking commented 2 years ago

Yes it is, but we need at least one field with one definition so we can start doing things back-to-back, including validation and testing.

pablojimpas commented 2 years ago

Any updates on this? If no one has planned to work on this in the short term, I could spend a bunch of time during August implementing this. However, I'll have to be pointed in the right direction or get some mentoring.

nicholaspcr commented 2 years ago

Hi mate, I'm the one assigned to implement this. My apologies for the delay, there was a bit more of focus on other internal issues on my end. While I don't think this should take a long time to do, I can't promise it will be on v3.21.0 that will be released on the third of August. I'm pretty confident that it should be done by the v3.21.1 release, hopefully this being added a bit later on doesn't cause you too much trouble.

pablojimpas commented 2 years ago

hopefully this being added a bit later on doesn't cause you too much trouble.

Absolutely not, don't worry, I was just letting it be known that I'd be available to do the job if there was no one else. I'm happy that you've picked this up, feel free to discuss with me anything related to this as you make progress.

johanstokking commented 2 years ago

@adriansmares summarizing:

ACK?

adriansmares commented 2 years ago
  • nullable types (omitted field in DR vs nullable proto type with value subfield in proto JSON)

That happens only if you use encoding/json. Using jsonpb will render normally (null or actual value, no wrapping).

  • CayenneLPP: we can do a best effort mapping to the normalized payload. There are some limitations though, for instance it has "temperature" but it doesn't say if it's air, water, soil, surface, etc. If we assume it's "air", we're probably right 90% of the time, but that 10% can cause issues upstream

I would rather not normalize this one at all then. If the semantics are not consistent, we reduce the usefulness of having semantics in the first place. Imagine the documentation: 'The location of the sensor can be trusted, except if the payload was originally encoded using CayenneLPP. Then probably it is right.'

  • Custom script: run the decoded output through the validator and if it's valid, set to normalized payload field

Run it only if normalized == true, otherwise we're wasting CPU for 99.99999% of the scripts which are not normalization-ready.

  • This is to avoid we need to set normalized: true on decoded payload. People will forget to set that and it will take them hours to figure out what went wrong and then weeks to wait until the next release

I disagree. We can have checks in the device repository that says 'hey your output would be valid normalized payload, are you sure you didn't forget to set normalized: true ?'

johanstokking commented 2 years ago
  • Custom script: run the decoded output through the validator and if it's valid, set to normalized payload field

Run it only if normalized == true, otherwise we're wasting CPU for 99.99999% of the scripts which are not normalization-ready.

I'll benchmark this.

  • This is to avoid we need to set normalized: true on decoded payload. People will forget to set that and it will take them hours to figure out what went wrong and then weeks to wait until the next release

I disagree. We can have checks in the device repository that says 'hey your output would be valid normalized payload, are you sure you didn't forget to set normalized: true ?'

But who will see that message?

adriansmares commented 2 years ago

I'll benchmark this.

Good. It may indeed be the case that I'm exaggerating.

But who will see that message?

I meant to add it to the GitHub action that runs the validation. I see now that things aren't as well developed as I wished. I knew we had codec quickstart files, so I hoped these get validated as well as part of the validation, but it doesn't seem to be the case. But if we would validate the codec quick start, we could also look at the output if it would be a valid normalized payload.

johanstokking commented 2 years ago

so I hoped these get validated as well as part of the validation, but it doesn't seem to be the case.

It is the case, see https://github.com/TheThingsNetwork/lorawan-devices/blob/master/bin/validate.js#L99. Or do you mean something else? We use the Go script so we have the same runtime as TTS.

Another reason to be reluctant with changing the function signature or argument or return objects is that I managed to standardize what we have today in the Payload Codec spec that will be published by the LoRa Alliance. If we add fields that must be set in the payload codec implementation we could run into issues in the future.

adriansmares commented 2 years ago

It is the case, see https://github.com/TheThingsNetwork/lorawan-devices/blob/master/bin/validate.js#L99. Or do you mean something else? We use the Go script so we have the same runtime as TTS.

I didn't see that part, my bad.

Another reason to be reluctant with changing the function signature or argument or return objects is that I managed to standardize what we have today in the Payload Codec spec that will be published by the LoRa Alliance. If we add fields that must be set in the payload codec implementation we could run into issues in the future.

Then I hope that benchmarks show that I'm paranoid.

ACK for implementation as described in your proposal comment.