Enforce a standard format for decoded sensor data

Summary

The main explanation of this issue has already been described in a previous one: https://github.com/TheThingsNetwork/lorawan-devices/issues/237

However, this one is about making a detailed data model specification for the sensors in devices.

Why do we need this?

There's value in going a step further than just giving best practices, actually enforcing a strict format has benefits.

For example, suppose that someone wants to create an app that scans the QR code of a LoRaWAN device, automatically registers it in The Things Stack and starts displaying data from the device's various sensors. Currently, it's impossible to build such an app in a manufacturer-agnostic way, completely decoupled from the device specifics.

Having a known data model/format will allow building integrations/apps using data coming from The Things Stack without caring about what device it's providing that data, the units used…just caring about the capabilities/sensors that it has.

What is already there? What do you see now?

What is missing? What do you want to see?

A strict data model specification, its implementation in every current decoder, and its enforcement in new devices.

How do you propose to implement this?

In the spirit of simplicity, I would enforce all units to be in SI, that's one less problem to care about when defining the data model.
Of course, this enforced data model it's only for sensor data that the devices send, manufacturers would still be able to send all the additional data that they want in whatever format they wish.

Can you do this yourself and submit a Pull Request?

I can help to provide requirements for the spec and maybe migrate current decoders to comply with the format.

Work in progress normalized format

quantity	sensor type	units	data type	validity
temperature	temperature, surface temperature	degrees Celsius	number
humidity	humidity, moisture	percentage	number	[0.0, 100.0]
pressure	barometer, pressure, vapour pressure	hectopascal	number
conductivity	conductivity	microsiemens per cm	number
pH	soil pH, water pH	pH scale	number	[0.0, 14.0]
time	time	second	number
direction	wind direction	degree	number	[0.0, 360.0]
velocity	velocity, wind speed	metre per second	number
acceleration	accelerometer	metre per second squared	number
length	altitude, distance	metre	number
mass		kilogram	number
density	co, co2	kilogram per cubic metre	number
current		ampere	number
voltage		volt	number
power		watt	number
solar radiation	solar radiation	watt per square metre	number
luminous flux		lumen	number
luminous intensity		candela	number
luminance		candela per square metre	number
illuminance		lux	number

@pablojimpas thanks for your suggestion.

I think this can be very useful indeed. In my view, there would be a common JSON schema for decoded payload; the normalized payload. Then, there would be another function that maps the output from the uplink decoder to this new schema.

I would like to keep that as a separate, optional step, after the uplink decoder. The decoded uplink is preserved in upstream messages in The Things Stack, and there will be a new field with normalized fields. The reason for this is that this standard JSON schema will never fully cover all device specific fields.

We already track sensors so what's really to be done is drafting a JSON schema with properties for each of these sensor types, that clearly define their data type, unit, description and validity (max/min, scale, patterns, enum values etc).

When we have that, we can define a new function signature maybe normalizeUplink(data) that simply takes the output from decodeUplink() and returns the normalized fields.

The Things Stack would make this available on normalized_payload (next to decoded_payload).

What do you think?

@johanstokking I like your approach! Having this as a separate schema will allow us to draft the solution more quickly and then, since it's an optional step, the transition period will be graceful. Each manufacturer will be able to implement the normalizeUplink(data) function at its own pace.

Now it's just a matter of adding the new normalized_payload field in The Things Stack and defining a good data model for each sensor type. Even that could be done incrementally, a first solution could be made with the most common/easier sensor types like temperature, humidity, pressure, GPS, moisture…

I may be able to contribute to this if I'm pointed in the right direction, since I want to see this happen promptly, but I don't promise anything, looking forward to your comments.

We can very well use some help here. Are you proficient with drafting JSON schema?

What we need is to extend the schema with:

An uplinkNormalizer (see uplinkDecoder) that takes as input decodeOutput (that's the output from uplinkDecoder) and outputs a new schema normalizedPayload
New schema normalizedPayload with properties for each sensor type. This needs a definition of the data types, units, descriptions and validations

If you're not proficient with JSON schema you can also just start with a table defining the sensor types in terms of unit and validity info. We can get that drafted in a JSON schema.

What we may also consider is allowing multiple measurements or an indication of the actual sensor. There may be more than one sensor of the same type, like buttons. CayenneLPP supports numbered channels for this.

We can very well use some help here. Are you proficient with drafting JSON schema?

Unfortunately, I've minimal experience with JSON schemas.

If you're not proficient with JSON schema you can also just start with a table defining the sensor types in terms of unit and validity info. We can get that drafted in a JSON schema.

I'll take care of it. I'll start such a table with all the current sensor types in the first comment of this topic so that it is visible, I will start with just the basics, but I encourage input from all who are interested.

@johanstokking how do you think generic sensor types should be handled? For example, a sensor type of analog input could be a temperature reading or a humidity reading or whatever else.

I also see an issue with the current list of sensor types, there's for example barometer, pressure and vapor pressure sensor types, what's the difference? In theory, they all measure the same physical quantity and should produce the same kind of output.

I think right now it's a bit messy mixing types of sensor with quantities, types of sensor it's a more consumer-facing concept but the actual quantity being measured it's what developers/integrators are interested in. So, maybe the normalized output should only care about types of measurements and not about types of sensors, what do you think?

So maybe the normalized output should only care about types of measurements and not about types of sensors, what do you think?

Yes I agree. We should probably define the quantities and the units. Which sensor produced a measurement can be defined in some sort of channel or sensor index.

Yes I agree. We should probably define the quantities and the units. Which sensor produced a measurement can be defined in some sort of channel or sensor index.

So, something like this:

"normalized_payload": {
        "temperature": [
                {
                        "value": 17.5,
                        "unit": "celsius",
                        "sensor": "surface temperature"
                }
        ],
        "pressure": [
                {
                        "value": 1022,
                        "unit": "hectopascal",
                        "sensor": "vapor pressure"
                },
                {
                        "value": 1013.25,
                        "unit": "hectopascal",
                        "sensor": "barometer"
                }
        ]
}

Well, maybe the unit field could be removed, since it will already be specified in the documentation of the format and will always be the same for the same quantity.

@johanstokking I've started working on a JSON schema for the format that I described in my previous comment. I'm only making required the value and sensor properties because the unit property it's not really needed, as I stated earlier, since we're making units standard for this format.

I'll post a PR with my early progress soon, but first let me know if you've something against this proposed format, please.

Thanks, see review comments in the PR.

Indeed we don't need units.

Maybe we need to support both simple readings and per-sensor readings?

{
  "temperature": 21.2,
  "humidity": 37.5,
  "sensors": {
    "test": {
      "temperature": 23.5
    },
    "other": {
      "humidity": 41.1,
      "windSpeed": 11.9
    }
  }
}

This way, we can do the following:

90% of sensors will only use the top-level properties as they have at most one sensor of each type
We can add multiple sensors of the same type under sensors
We can have multiple sensor groups under sensors

The way this would work is that there's a pattern properties within sensors that has the same object type.

Maybe we need to support both simple readings and per-sensor readings?

I agree that would make most common payloads smaller and simpler while still retaining the flexibility to model complex scenarios.

However, I see a flaw with your proposed format. Suppose that an integrator wants to make use of this normalized payload and gets:

{
  "temperature": 21.2,
  "humidity": 37.5
}

How will the end-user app know if temperature it's coming from an ambient sensor or a soil sensor, for example?

I like the simpler approach, but I think we have to keep some context in the data to be useful. Some quantities are used in many situations, and knowing which type of sensor made the measurement gives the context to figure out those situations.

How will the end user app know if temperature it's coming from an ambient sensor or a soil sensor for example?

I like the simpler approach but I think we have to retain some context in the data to be useful, some quantities are used in a lot of different scenarios and generally speaking knowing which type of sensor produced the measurement gives the necessary context to infer those scenarios.

Right, so then we have two options:

{
  "soilTemperature": -3.1,
  "airTemperature": 9.4
}

Or:

{
  "temperature": [
    {
      "value": -3.1,
      "source": "soil"
    },
    {
      "value": 9.4,
      "source": "air"
    }
  ]
}

Here, anyone can access temperature[0].value for the first reading, which is often enough. However, when using arrays, the order matters, and that may become problematic and may require the application layer to enumerate the readings and recognize the source. In this example, I made source an arbitrary, sensor-specific identifier.

Therefore I would prefer the first scenario: it's very explicit. There can be different temperatures: soil, water, air, for thermostats also the target temperature etc.

Things like soil moisture levels on different depths (i.e. % at 25/50/75/100 cm deep) wouldn't be normalized this way; we would still need arrays for that. So we can also do both:

{
  "soilMoisture": [
    {
      "value": 29.4,
      "source": "-100cm"
    },
    {
      "value": 36.5,
      "source": "-75cm"
    },
    ...
  ]
}

This way, most applications will just use soilMoisture[0].value but if the device and application works with multiple soil moisture values, it can be modeled.

What do you think?

What do you think?

Overall, I agree with your analysis @johanstokking, it's a good compromise to cover every possible use case; however, this solution will require more domain knowledge for every measurement to get the naming right!

I can start modeling the easy ones in the JSON schema if you want, but I would like to get more input from other parties (e.g. device manufacturers) to make sure everyone's interests are met.

On the other hand, what changes will be necessary on lorawan-stack to support this new field? Is it trivial? I guess that since the normalizedPayload will be optional it can be implemented in the LoRaWAN server regardless of the spec being 100% ready, getting this format right isn't a blocker for the implementation as far as I understand. Maybe I can open another issue there to track the actual implementation of this feature.

@johanstokking any chances of seeing this in lorawan-stack before the 3.20 milestone?

I guess that since the normalizedPayload will be optional it can be implemented in the LoRaWAN server regardless of the spec being 100% ready, getting this format right isn't a blocker for the implementation as far as I understand. Maybe I can open another issue there to track the actual implementation of this feature.

Yes you are correct; the implementation is fairly trivial in The Things Stack and we can make 3.20.0.

You can indeed open an issue in https://github.com/TheThingsNetwork/lorawan-stack/issues referencing this one. I can also do it but good to file in your own words and you'll be subscribed automatically etc.

Yes you are correct; the implementation is fairly trivial in The Things Stack and we can make 3.20.0.

You can indeed open an issue in https://github.com/TheThingsNetwork/lorawan-stack/issues referencing this one. I can also do it but good to file in your own words and you'll be subscribed automatically etc.

Perfect! I've just created the new issue there https://github.com/TheThingsNetwork/lorawan-stack/issues/5429 and mentioned you, so you also get in the discussion.

Overall I agree with your analysis @johanstokking, it's a good compromise to cover every possible use case, however, this solution will require more domain knowledge for every measurement to get the naming right!

Yes it does indeed. Knowing the difference between air and soil temperature is necessary domain knowledge I think. Naive applications may otherwise mix up different quantitites. I think we need to keep those quantities (like air vs soil temperature) separate from the unit (both Celcius).

The question is though whether we need this array with multiple values. I think there are a few use cases for it:

Multiple readings combined in one uplink message. This could be the same sensor, where the firmware gathers readings over time and then sends them in one uplink message. This is actually a best practice
Multiple sensors of the same quantity (like soil moisture sensors at different levels)

So value would be mandatory, and source and time optional.

I can start modeling the easy ones (temperature, moisture...) in the JSON schema if you want but I will prefer to get more input from other parties (e.g. device manufacturers) to cover everyone's interests.

Yes true. This will gradually grow over time, just like we kept adding sensors to the Device Repository. This is very much an iterative process.

Yes it does indeed. Knowing the difference between air and soil temperature is necessary domain knowledge I think. Naive applications may otherwise mix up different quantitites. I think we need to keep those quantities (like air vs soil temperature) separate from the unit (both Celcius).

Absolutely, that's the main benefit of creating this normalized format, to give context/meaning to the data used by end user applications. We must protect that feature in the implementation.

The question is though whether we need this array with multiple values. I think there are a few use cases for it:

* Multiple readings combined in one uplink message. This could be the same sensor, where the firmware gathers readings over time and then sends them in one uplink message. This is actually a best practice

* Multiple sensors of the same quantity (like soil moisture sensors at different levels)

Two concrete examples will help us understand this more easily, one trivial and one that simulates a fairly complete scenario. This will ensure that we don't miss any important detail from the simplest case to the very complex device. The normalized format should be flexible enough to cover both cases and the most straightforward solution (avoid arrays and properties bloat if possible), while still retaining context for the data.

The first one will be a device that just sends one reading at a time of ambient temperature. The ideal and most straightforward format in that case to me will be just this:

{
  "ambientTemperature": 20.2
}

As an app integrator, you maintain all the context (quantity=temperature, source=ambient, units=implied by the quantity, documented somewhere in the specification of this format), and you don't have to deal with anything else.

But now suppose that we have a single microcontroller getting the following measurements from different sensors:

Ambient Temperature
Ambient Humidity
Atmospheric Pressure
Solar Radiation
Wind Speed
Wind Direction
Leaf Humidity
Soil Temperature (50cm)
Soil Moisture (50cm)
Soil Temperature (10cm)
Soil Moisture (10cm)
Soil EC (10cm)
Soil PH (10cm)
Soil Nitrogen (10cm)
Soil Phosphorus (10cm)
Soil Potassium (10cm)

Then, it groups 2 readings spaced in time and packs them into a single LoRaWAN packet. So, the first format that comes to my mind to model this will be something like this:

{
  "readings": [
    {
      "time": ...,
      "ambientTemperature": 20.3,
      "ambientHumidity": 33.0,
      "atmosphericPressure": 1012.4,
      "solarRadiation": 294.4,
      "windSpeed": 2.8,
      "windDirection": 181.0,
      "leafHumidity": 23.5,
      "soilTemperature": [
        {
          "value": 13.5,
          "source": "50cm"
        },
        {
          "value": 15.5,
          "source": "10cm"
        }
      ],
      "soilMoisture": [
        {
          "value": 60.5,
          "source": "50cm"
        },
        {
          "value:" 55.0,
          "source": "10cm"
        }
      ],
      "soilEC": {
        "value": 2740.0,
        "source": "10cm"
      },
      "soilPH": {
        "value": 5.8,
        "source": "10cm"
      },
      "soilNitrogen": {
        "value": 200.4,
        "source": "10cm"
      },
      "soilPhosphorus": {
        "value": 158.8,
        "source": "10cm"
      },
      "soilPotassium": {
        "value": 303.1,
        "source": "10cm"
      },
    },
    {
      "time": ...something else...,
      /*...sencond reading...*/
    }
  ]
}

This is less than ideal because from the perspective of an integrations developer, if you need to support both scenarios (or anything in between) you have to check for an huge number of possibilities, but it's needed to preserve all the context.

The format should be uniform regardless of the scenario, but the complexity has to be modeled somewhere (the proposed format could certainly be improved though), ideally we come up with a solution that doesn't pollute too much the simple cases, the first example uniformed with the latter format will look like this:

{
  "readings": [
    {
      "ambientTemperature": 20.2
    }
  ]
}

Which does not look that horrible from an integrator perspective: readings[0].ambientTemperature

We still have to consider the source, which can be dynamic for some quantities (e.g. soil temperature). We can consider this to be some kind of “source modifier” because it is valuable to have a well-defined source. And there could also be multiple readings of the same quantity. So, the above example in reality will be unified to:

{
  "readings": [
    {
      "temperature": [
        {
          "value": 20.2,
          "source": "ambient"
        },
      ]
    }
  ]
}

This it's starting to get really ugly, but I think being uniform may be a necessary evil. The complex example will partially look like this:

{
  "readings": [
    {
      "time": ...,
            ...
            "temperature": [
        {
          "value": 20.3,
          "source": "ambient"
        },
        {
          "value": 15.5,
          "source": "soil",
          "modifier": "10cm"
        },
      ]
            ...
    },
        ...
  ]
}

So, we have: time and modifier and quantities (temperature, humidity…) as optional fields and then value and source as mandatory to preserve context. We also have not only one but two arrays: one for considering packets grouping multiple reading together, and another one for considering the fact that the same quantity may have more than one value.

From the integrator perspective, it's valuable knowing that the data will be uniform regardless of the scenario you're dealing with. Not having this extra verbosity comes at the expense of having to check for a gargantuan number of possibilities if you want to cover every scenario without being a “naive application”.

The “API” for working with this data it's not that bad apart from the two arrays:

package main

import (
    "encoding/json"
    "fmt"
    "time"
)

type Reading struct {
    Time        time.Time     `json:"time,omitempty"`
    Temperature []Measurement `json:"temperature,omitempty"`
}

type Measurement struct {
    Value    float32 `json:"value"`
    Source   string  `json:"source"`
    Modifier string  `json:"modifier,omitempty"`
}

func main() {
    rawData := `[
        {
            "time": "2022-05-06T19:07:10Z",
            "temperature": [
                {
                    "value": 20.3,
                    "source": "ambient"
                },
                {
                    "value": 15.5,
                    "source": "soil",
                    "modifier": "10cm"
                }
            ]
        },
        {
            "time": "2022-05-06T19:27:10Z",
            "temperature": [
                {
                    "value": 13.5,
                    "source": "soil",
                    "modifier": "10cm"
                }
            ]
        },
        {
            "time": "2022-05-06T19:57:10Z",
            "temperature": [
                {
                    "value": 9.0,
                    "source": "soil",
                    "modifier": "50cm"
                }
            ]
        },
        {
            "time": "2022-05-06T20:59:10Z",
            "humidity": [
                {
                    "value": 83.8,
                    "source": "soil",
                    "modifier": "10cm"
                }
            ]
        }
    ]`
    var readings []Reading
    err := json.Unmarshal([]byte(rawData), &readings)
    if err != nil {
        fmt.Println(err)
    }

    // eg. print only the measurements of soil temperature at 10cm
    for _, r := range readings {
        for _, t := range r.Temperature {
            if t.Source == "soil" && t.Modifier == "10cm" {
                fmt.Printf("%v: Soil temperature at 10cm was %v\n", r.Time, t.Value)
            }
        }
    }
}

My remaining concern with this is the time field, which is always optional. When you have more than one reading, you must know the time that measurement was taken. When you have only one reading, everything is fine.

Please excuse such a long example to make my points, but hopefully, I've brought some concerns to the table, so we can design a better solution.

Thanks for the examples. I also think that taking realistic example measurements into account is very helpful.

I would also prefer avoiding traversing an array of similar units to find the source of interest. I like your initial example where you differentiate measurements; some measurements need more specification (like anything below soil surface), while others don't (like ambient temperature).

My suggestions would be:

Some measurements won't require further specification. The phyiscal location of the device is enough specification. It is unlikely that a device is equipped with multiple sensors to measure the same thing. In this case, there should be a scalar value. For example: ambientTemperature, atmosphericPressure, windDirection, windSpeed, etc
Some (groups of) measurements will need additional, domain specific specification. For example: soil needs a depth relative to the physical location of the sensor.
Absolute time may not always be known at the codec level. I think we should support two specifiers: absolute time and logical time. The codec may be able to covert relative time (like a minute or hour field in the payload) into absolute time, but there may be just a logical order of readings. Configuration that is unknown to the codec but known further upstream may further designate the absolute time that belongs to a logical time. Or, if the device is not aware of time and just performs readings and sends it, there's no time specifier necessary (except for grouping, see below). Allowing the codec to convert relative time in absolute time means that the codec needs access to the received_at timestamp as input parameter but we can easily do that
I think that we should have support for multiple readings indeed, but we may not need arrays of measurements in the same reading. If there are multiple measurements (like various soil depths), the codec can spread this to multiple readings.

Example:

{
  "readings": [
    {
      "logicalTime": 1,
      "ambientTemperature": "20.2", // Celcius
      "soil": {
        "depth": 15, // centimeters down
        "moisture": 15.5, // percentage?
        "temperature": 9.4 // Celcius
      }
    },
    {
      "logicalTime": 1,
      "soil": {
        "depth": 25, // centimeters down
        "moisture": 10.9, // percentage?
        "temperature": 3.1 // Celcius
      }
    }
  ]
}

Accessibility is not that bad:

readings[0].ambientTemperature for the ambient temperature
readings[0].soil.temperature and readings[1].soil.temperature for various soil temperatures

What do you think?

What do you think?

Overall, I agree with your modifications. I like what you did to avoid the second array, grouping related measurements and spreading across different readings if necessary.

Here are some comments:

It is unlikely that a device is equipped with multiple sensors to measure the same thing.

What if it is? Is that situation what you are referring to later in point 4? I think that's the easiest solution, spreading the measurements in the already needed array.

3. Configuration that is unknown to the codec but known further upstream may further designate the absolute time that belongs to a logical time.

I don't quite get that, what other upstream entity could know about the relative to absolute time conversion apart from the decoder/manufacturer?

It is unlikely that a device is equipped with multiple sensors to measure the same thing.

What if it is? Is that situation what you are referring later on point 4? I think that's the easiest solution, spreading the measurements in the already needed array.

Yes indeed. So we won't forbid ambientTemperature in two separate readings with the same logical/absolute time.

In case of soil, there must be specifier like depth. But in case of ambient temperature or wind, I don't think that one device would measure two different readings with distinct sensors. But ok, it can, and we allow for it if we stick to this format.

Configuration that is unknown to the codec but known further upstream may further designate the absolute time that belongs to a logical time.

I don't quite get that, what other upstream entity could know about the relative to absolute time conversion apart from the decoder/manufacturer?

What I meant is that some devices can be remotely configurable (via downlink) with a measurement interval. Like every hour or every 2 hours. The payload may not contain the timestamps to save space. So the codec sees two groups of readings but doesn't know how far apart the readings are.

For this, though, what we'd also like to do is adding state to the codec context. It's a bit off topic here, but the idea is that every codec has access to the input state and can return the updated state. Think of it as a digital twin. The state may contain the current sensor configuration (as sent via downlink message and acked by the device, or as received many messages back when the device sent a status message). So maybe in the future the codec will be able to convert logical time in absolute time, until we can provide that context here, we can't rely on absolute timestamps.

But in case of ambient temperature or wind, I don't think that one device would measure two different readings with distinct sensors. But ok, it can, and we allow for it if we stick to this format.

Exactly, would be a less common case for sure, but you never know, one could set up a device to compare the precision of different sensors measuring the same quantity for example, so it's nice to be flexible here.

What I meant is that some devices can be remotely configurable (via downlink) with a measurement interval. Like every hour or every 2 hours. The payload may not contain the timestamps to save space. So the codec sees two groups of readings but doesn't know how far apart the readings are.

Aaah I see…so an integration could potentially configure the interval sending a downlink and then that integration will have the context necessary to make the relative time conversion. Right?

For this, though, what we'd also like to do is adding state to the codec context. It's a bit off topic here, but the idea is that every codec has access to the input state and can return the updated state. Think of it as a digital twin. The state may contain the current sensor configuration (as sent via downlink message and acked by the device, or as received many messages back when the device sent a status message). So maybe in the future the codec will be able to convert logical time in absolute time, until we can provide that context here, we can't rely on absolute timestamps.

That's a bit advanced and out of the scope of this issue sure, might be convenient to track that in a different one. Otherwise, I think we are ready to start defining the JSON schema with the simplest quantities to start iterating on this.

so an integration could potentially configure the interval sending a downlink and then that integration will have the context necessary to make the relative time conversion. Right?

Yes. That is what we mean with upstream; north of Application Server. Uplink messages magically flow against gravity.

Otherwise, I think we are ready to start defining the JSON schema with the simplest quantities to start iterating on this.

Yes. We'll triage https://github.com/TheThingsNetwork/lorawan-stack/issues/5429 tomorrow morning CEST. The schema definition is pretty much decoupled from support in The Things Stack. I agree that we should start with the simplest quantities and iterate. We produce releases every 2 or 3 weeks so new fields are usable pretty quickly.

@pablojimpas I'm picking this up now. I'm revisiting the example I shared above, and now I realize there's a discrepancy between ambientTemperature and the soil temperature which should now be referenced by soil.temperature.

If soil.temperature is referenced but soil is undefined, you'll get "object is not defined" errors. For this reason we may want to avoid nested objects altogether.

Something like this:

{
  "readings": [
    {
      "logicalTime": 1,
      "ambientTemperature": "20.2", // Celcius
      "soilDepth": 15, // centimeters down
      "soilMoisture": 15.5, // percentage?
      "soilTemperature": 9.4 // Celcius
    },
    {
      "logicalTime": 1,
      "soilDepth": 25, // centimeters down
      "soilMoisture": 10.9, // percentage?
      "soilTemperature": 3.1 // Celcius
    }
  ]
}

Do you have any progressive insight on this?

In any case, we'll start making this work end-to-end with ambientTemperature because that seems to be the least controversial thing here.

If soil.temperature is referenced but soil is undefined, you'll get "object is not defined" errors. For this reason we may want to avoid nested objects altogether.

From an integration developer perspective, what's the different between handling a non-existent soil object when trying to get soil.temperature and handling a non-existent soilTemperature field?

I guess that yes, it will be simpler to avoid nesting as much as possible, and as long as we can express everything with the right naming convention, I'm fine with that.

Do you have any progressive insight on this?

Not really, but the more that I think about the array, the less that I like it…but If I recall correctly, we determined earlier that it was almost inevitable.

If soil.temperature is referenced but soil is undefined, you'll get "object is not defined" errors. For this reason we may want to avoid nested objects altogether.

From an integration developer perspective, what's the different between handling a non-existent soil object when trying to get soil.temperature and handling a non-existent soilTemperature field?

True. One can argue that developers have to account for undefined values anyway, so nested objects would not make a difference.

Then, should we put temperature under ambient? Or rather air? So that we put other air measurements under air as well?

I guess that yes, it will be simpler to avoid nesting as much as possible, and as long as we can express everything with the right naming convention I'm fine with that.

I've got an idea to overcome this.

I really want to encourage device makers to combine multiple readings in one LoRaWAN frame. This is just a really good practice. But I do get the issue on the other side: you'll end up with an array.

What if we introduce a new message type that is sent by the Application Server, for each normalized payload? This would be a first class citizen in the message types: just like we have activations, uplink messages, downlink events, etc. So if there's one normalized payload measurement in the message, AS publishes one message. If there's an array with two items, AS publishes two messages. For the application developer, two individual uplink frames with one measurement will look exactly the same as one uplink frame with two measurements.

The "full" uplink message will still carry the array of normalized payloads; there will be an extra, simpler message that is only published if there's normalized payload.

Then, should we put temperature under ambient? Or rather air? So that we put other air measurements under air as well?

I think ambient and air are mostly equivalent in this case (the same for pressure and humidity). However, deciding this will require a checkup of the current available devices from different manufacturers and their primary use cases, that will help to settle the names.

What if we introduce a new message type that is sent by the Application Server, for each normalized payload? This would be a first class citizen in the message types: just like we have activations, uplink messages, downlink events, etc. So if there's one normalized payload measurement in the message, AS publishes one message. If there's an array with two items, AS publishes two messages. For the application developer, two individual uplink frames with one measurement will look exactly the same as one uplink frame with two measurements.

This is actually a pretty clever solution to avoid arrays and achieve uniform messages, I think this can work beautifully. The only downside that I can foresee is that if an integration uses webhooks to redirect uplinks, for example, it will now have to be aware of both uplink events and this new type of event.

The only downside that I can foresee is that if an integration uses webhooks to redirect uplinks, for example, it will now have to be aware of both uplink events and this new type of event.

Initially I thought of sending the uplink message multiple times indeed, but that can cause problems upstream.

The plan to add a new message type here, and we already encourage integration developers to specify a path for that message to keep things separate:

I need to look into this a bit more to see if we can actually proceed with this, but it looks like we can.

These are the current flattened output fields of all codecs that provide examples:

_type
acceleration.x
acceleration.y
acceleration.z
accelerationchange
accelerationx
accelerationy
accelerationz
accuracy_aqi
aci1_ma
aci2_ma
activatedservices[].0
activatedservices[].1
activatedservices[].10
activatedservices[].11
activatedservices[].12
activatedservices[].13
activatedservices[].14
activatedservices[].15
activatedservices[].16
activatedservices[].17
activatedservices[].18
activatedservices[].19
activatedservices[].2
activatedservices[].20
activatedservices[].21
activatedservices[].22
activatedservices[].23
activatedservices[].24
activatedservices[].25
activatedservices[].26
activatedservices[].27
activatedservices[].28
activatedservices[].29
activatedservices[].3
activatedservices[].30
activatedservices[].31
activatedservices[].32
activatedservices[].33
activatedservices[].34
activatedservices[].35
activatedservices[].36
activatedservices[].37
activatedservices[].38
activatedservices[].4
activatedservices[].5
activatedservices[].6
activatedservices[].7
activatedservices[].8
activatedservices[].9
active energy export t1.unit
active energy export t1.value
active energy export t2.unit
active energy export t2.value
active energy import t1.unit
active energy import t1.value
active energy import t2.unit
active energy import t2.value
activethreshold
activity
activity_counter.displayname
activity_counter.value
actuator
adc_ch0v
adcraw
adcrawvalue1
adcrawvalue2
adcrawvaluechange
addr
adr_state
air
air_humidity.displayname
air_humidity.unit
air_humidity.value
air_temperature.displayname
air_temperature.unit
air_temperature.value
airpressure
airpressurechange
alarm
alarm_status
alarm_timer
alarmmsgcount
alarmthreshold
alertcancellation
altitude
ambient_light_infrared.displayname
ambient_light_infrared.value
ambient_light_visible_infrared.displayname
ambient_light_visible_infrared.value
ambient_sensor_failure
ambient_sensor_raw
ambient_temperature
ambienttemp
ana
angle
anglechange
angleofinclination
anglex
angley
anglez
appmainvers
appminorvers
aqi
aqi_partial.1.0
aqi_partial.10
aqi_partial.2.5
atmospheric_pressure.displayname
atmospheric_pressure.unit
atmospheric_pressure.value
atomsphere
average_current_consumed
average_current_generated
average_value
avi1_v
avi2_v
backlogmsgcount
barometer_temperature.displayname
barometer_temperature.unit
barometer_temperature.value
barometric_pressure.displayname
barometric_pressure.unit
barometric_pressure.value
bat
bat_mv
bat_status
bat_v
batlevel
battery
battery_capacity_percentage
battery_vol
battery_voltage
battery_voltage.displayname
battery_voltage.unit
battery_voltage.value
batterycapacity
batterychange
batterylevel
batterytype
batteryvoltage
battvoltage
batv
batvoltage
beacon_rssi
ble
ble.mode.averaging_mode
ble.mode.number_of_devices
blescantime
boot
boxtamper
brokensensor
button
cable
cached.headingdeg
cached.latitudedeg
cached.longitudedeg
cached.speedkmph
calendareventlist
capabilities
capacitor_voltage_1.displayname
capacitor_voltage_1.unit
capacitor_voltage_1.value
capacitor_voltage_2.displayname
capacitor_voltage_2.unit
capacitor_voltage_2.value
cfgstatus
ch0_cumulative_pulse_count.displayname
ch0_cumulative_pulse_count.value
ch0_pulse_count.displayname
ch0_pulse_count.value
ch0_pulse_interval.displayname
ch0_pulse_interval.unit
ch0_pulse_interval.value
ch1_cumulative_pulse_count.displayname
ch1_cumulative_pulse_count.value
ch1_pulse_count.displayname
ch1_pulse_count.value
ch1_pulse_interval.displayname
ch1_pulse_interval.unit
ch1_pulse_interval.value
ch20
change_output_states.output1
change_output_states.output2
channel
channelcount
charging
childlock
class
class_group_text
closeddwelltime
cmd
co
co2
co2_concentration.displayname
co2_concentration.unit
co2_concentration.value
co2_concentration_lpf.displayname
co2_concentration_lpf.unit
co2_concentration_lpf.value
co2_ppm
co2_sensor_status.displayname
co2_sensor_status.value
co2_sensor_temperature.displayname
co2_sensor_temperature.unit
co2_sensor_temperature.value
co2eq
co2filtered
co2raw
coalarm
compass_heading.displayname
compass_heading.unit
compass_heading.value
conduct_soil
count
counter
counter_reading.displayname
counter_reading.value
countervalue
cputemp
ct
cumulative_precipitation.displayname
cumulative_precipitation.unit
cumulative_precipitation.value
cumulative_pulse_count.displayname
cumulative_pulse_count.value
current
current-transformer primary.value
current-transformer secondary.value
current.displayname
current.unit
current.value
current1
current2
current3
current_1
current_2
current_valve_position
currentchange
dalarm_count
darker
dashcurrentalarm
datecode
deadzonedistance
debounceadjust
decibel
decoder.info
decoder.version
deltvalue
dendrometer_a_position.displayname
dendrometer_a_position.unit
dendrometer_a_position.value
dendrometer_b_position.displayname
dendrometer_b_position.unit
dendrometer_b_position.value
dendrometer_position.displayname
dendrometer_position.unit
dendrometer_position.value
depth
depthchange
detection_interval
device
device_id
device_local_datetime
di0
di1
di1_status
di2_status
di3_status
dielectric_permittivity.displayname
dielectric_permittivity.value
digital_istatus
direction
dis1
dis2
disalarm
disassembledalarm
distance
distance.displayname
distance.unit
distance.value
distance1_cm
distance2_cm
distance3_cm
distance4_cm
distance_10th_percentile.displayname
distance_10th_percentile.unit
distance_10th_percentile.value
distance_25th_percentile.displayname
distance_25th_percentile.unit
distance_25th_percentile.value
distance_75th_percentile.displayname
distance_75th_percentile.unit
distance_75th_percentile.value
distance_90th_percentile.displayname
distance_90th_percentile.unit
distance_90th_percentile.value
distance_alarm
distance_average.displayname
distance_average.unit
distance_average.value
distance_maximum.displayname
distance_maximum.unit
distance_maximum.value
distance_median.displayname
distance_median.unit
distance_median.value
distance_minimum.displayname
distance_minimum.unit
distance_minimum.value
distance_most_frequent_value.displayname
distance_most_frequent_value.unit
distance_most_frequent_value.value
distancechange
divisor
do1_status
do2_status
do3_status
door_open_status
door_open_times
door_status
doorbell
dust.0.3
dust.0.5
dust.1.0
dust.10
dust.2.5
dust.5
e25
eaqi
east_wind_speed.displayname
east_wind_speed.unit
east_wind_speed.value
ec
ec5soildhumi
ec5te
eco2
electrical_conductivity.displayname
electrical_conductivity.unit
electrical_conductivity.value
elongation.displayname
elongation.unit
elongation.value
empty
empty_alarm_threshold
energy
energy_storage
energysourced
energyused
ens
err
error
errorcode
ext
ext_sensor
exti_trigger
fillinglvl
filllevel
fillmaxdistance
filtertime
finecurrent
finecurrent_1
finecurrent_2
fire
fire_alarm_threshold
firealarm
firmware_version
firmwaretype
firmwareversion
fixfailed
flags.adr
flags.antenna
flags.blescan
flags.countermode
flags.reserved
flags.screen
flags.screensaver
flashmemorycrcerrorstate
flashmemoryfullstate
flood
flow_sensor_failure
flow_sensor_raw
flow_temperature
format
fpm
frame_counter
fraud
freeheap
freestack
freezing_flag.displayname
freezing_flag.value
freq_band
frequency.displayname
frequency.unit
frequency.value
full
full_alarm_threshold
func
fw
gas_resistance
general_info.infobyte
general_info.infraredinputstatus
general_info.inputtype
general_info.meterid
general_info.metertype
general_info.statusbyte
general_info.time
general_info.time_origin
good_battery
gps_count
gpsepe
gpsfixes
gpslat
gpslong
gpssat
h2s
hardware_flag
hardware_mode
hardware_version
hardwareversion
harvesting_active
hdop
head_temperature.displayname
head_temperature.unit
head_temperature.value
header.batteryperc
header.configured
header.conntest
header.devicetype
header.idx
header.metertype
header.plugin_id
header.samplesetcount
header.valuecount
header.version
headingdeg
heartbeattime
heartinterval
highmotorconsumption
hightempalarm
historytrigger
hum_sht
humi
humichange
humidity
humidity.displayname
humidity.unit
humidity.value
humidity_percentage
humidityalarmlimithigh
humidityalarmlimitlow
humidityalarmsenabled
humiditychanged
hummax
hummin
humoffset
hw_rev
hw_version
hwver
hygro
iaq
iaqchanged
id
illuminance
illuminance.displayname
illuminance.unit
illuminance.value
illuminancechange
illuminancethreshold
illumination
inactive
inactivethreshold
inactivetime
infrared
infrared_and_visible
input.displayname
input.value
input1_frequency
input2_voltage
input_1
input_2
input_3
installstatus
interrupt_alarm
interrupt_flag
intrip
ip
ip1
ip2
ip3
ip4
irdetectiontime
irdisabletime
irradiance.white
k_soil
keep_status
keep_time
keepalive
last_door_open_duration
last_value
lastcolor_blue
lastcolor_green
lastcolor_offtime
lastcolor_ontime
lastcolor_red
lastsynctime
latitude
latitudedeg
ldo_do
ldo_sat
leaf_moisture
leaf_temperature
leaf_wetness_index.displayname
leaf_wetness_index.value
leak
ledble
ledlora
ledstate
level
levelpercentage
lidar_distance
lidar_signal
lidar_temp
light
light.displayname
light.value
light_detected
light_intensity
lighter
lightning_average_distance.displayname
lightning_average_distance.unit
lightning_average_distance.value
lightning_strike_count.displayname
lightning_strike_count.value
load
lon
longitude
longitudedeg
longrangetrigger
lora_count
loradr
loranotificationoptions[].0
loranotificationoptions[].1
loranotificationoptions[].2
loranotificationoptions[].3
loranotificationoptions[].4
loranotificationoptions[].5
lorawan
lorawanconfiguration
lowmotorconsumption
lsc
lso
lux
mac1
mac2
mac3
mac4
maindemand
major
max_value
maximum_wind_speed.displayname
maximum_wind_speed.unit
maximum_wind_speed.value
maxtime
mcu_temperature
md
measrate
measurement_interval.displayname
measurement_interval.value
measuretype
medium.desc
medium.type
memory
mes_type
message_type
messages[].battery
messages[].channel
messages[].hardwareversion
messages[].interval
messages[].measurementid
messages[].measurementvalue
messages[].sensorid
messages[].softwareversion
messages[].type
messages_received
messages_send
meter-typ.value
meterreading
mid year.value
midrangetrigger
min_value
minor
mintime
minwifidetects
mod
modbuserror
model
moisture
motion
motion_alarm_enable
motion_alarm_threshold
motion_event_count
motor_error
motorposition
motorrange
motorstatus
msgid
msginfo.msgcnt
msginfo.msgidx
msginfo.msgnum
msgtype
multiplier
multiplier1
multiplier2
multiplier3
n_soil
name
networktime
nh3
no
no2
no_beacon
noise
north_wind_speed.displayname
north_wind_speed.unit
north_wind_speed.value
ntu
number_of_samples.displayname
number_of_samples.value
number_of_valid_samples.displayname
number_of_valid_samples.value
numberofreadings
o3
obis_ids[].groupmask
obis_ids[].obis_id
obis_ids[].rawvalue
obis_ids[].scaler
obis_ids[].stringvalue
obis_ids[].unit
obis_ids[].value
obstruction
occupancy
occupied
occupy
onboard_temperature
ondistancethreshold
onoff
opendwelltime
openwindow
operating_mode
operatingmode
options[].0
options[].1
options[].10
options[].11
options[].12
options[].13
options[].14
options[].15
options[].16
options[].17
options[].18
options[].19
options[].2
options[].20
options[].21
options[].22
options[].23
options[].24
options[].25
options[].26
options[].27
options[].28
options[].29
options[].3
options[].4
options[].5
options[].6
options[].7
options[].8
options[].9
orp
overcurrentalarm
oxygen_concentration.displayname
oxygen_concentration.unit
oxygen_concentration.value
oxygen_concentration_alt.displayname
oxygen_concentration_alt.unit
oxygen_concentration_alt.value
oxygen_saturation.displayname
oxygen_saturation.unit
oxygen_saturation.value
p
p1
p_soil
parking_id
partnumber
pax
payload
payload.data.config.countervalues
payload.data.config.digitalinputs
payload.data.config.timestamp
payload.data.digitalinputs
payload.data.digitalinputs[].cot.cyclic
payload.data.digitalinputs[].cot.event
payload.data.digitalinputs[].cot.interrogation
payload.data.digitalinputs[].info.id
payload.data.digitalinputs[].info.type
payload.data.digitalinputs[].status.blocked
payload.data.digitalinputs[].status.state
payload.data.digitalinputs[].timestamp.string
payload.data.digitalinputs[].timestamp.unix
payload.data.timestamp.string
payload.data.timestamp.unix
payload.device.batterylevel
payload.device.devicestatus.batterypowered
payload.device.devicestatus.bufferoverflow
payload.device.devicestatus.configurationerror
payload.device.devicestatus.confirmationtimeout
payload.device.devicestatus.devicerestarted
payload.device.devicestatus.lowsupplyvoltage
payload.device.devicestatus.timesynced
payload.device.devicestatus.txcreditsconsumed
payload.device.info.devicedesignation
payload.device.info.deviceid
payload.device.info.devicemanufacturer
payload.device.info.deviceversion
payloadlength
payloadmask.battery
payloadmask.bme
payloadmask.counter
payloadmask.gps
payloadmask.reserved
payloadmask.sensor1
payloadmask.sensor2
payloadmask.sensor3
payloads[].id
payloads[].registers[].data_valid
payloads[].registers[].datavalid
payloads[].registers[].filterid
payloads[].registers[].unit
payloads[].type
pbaro
pd (p1-pbaro)
pellets[].delta
pellets[].total
periodic_detection_interval
periodic_interval
periodic_upload_interval
ph
ph1_soil
photosynthetically_active_radiation.displayname
photosynthetically_active_radiation.unit
photosynthetically_active_radiation.value
pinginterval
pingtype
pitch
pm.1.0
pm.10
pm.2.5
pm0_5_number_concentration.displayname
pm0_5_number_concentration.unit
pm0_5_number_concentration.value
pm1
pm10
pm10_mass_concentration.displayname
pm10_mass_concentration.unit
pm10_mass_concentration.value
pm10_number_concentration.displayname
pm10_number_concentration.unit
pm10_number_concentration.value
pm1_0
pm1_0_mass_concentration.displayname
pm1_0_mass_concentration.unit
pm1_0_mass_concentration.value
pm1_0_number_concentration.displayname
pm1_0_number_concentration.unit
pm1_0_number_concentration.value
pm25
pm2_5
pm2_5_mass_concentration.displayname
pm2_5_mass_concentration.unit
pm2_5_mass_concentration.value
pm2_5_number_concentration.displayname
pm2_5_number_concentration.unit
pm2_5_number_concentration.value
pm4_mass_concentration.displayname
pm4_mass_concentration.unit
pm4_mass_concentration.value
pm4_number_concentration.displayname
pm4_number_concentration.unit
pm4_number_concentration.value
pm_10
port
portfunction
posseq
posstatus
potentiometer_position.displayname
potentiometer_position.value
power
powerchange
poweroffalarm
powersourcedcount
powersourcedperhour
powerusedcount
powerusedperhour
ppm
precipitation.displayname
precipitation.unit
precipitation.value
precipitation_interval.displayname
precipitation_interval.unit
precipitation_interval.value
presstime
pressure
pressure.displayname
pressure.unit
pressure.value
protocol_version
proxx
pulse1
pulse_count.displayname
pulse_count.value
pulse_interval.displayname
pulse_interval.unit
pulse_interval.value
pulseabs
pulsecount
pulsecount1
pulsecount2
pulsecounter1
pulsecounter2
pulsecounterclearmode
radar
radio_communication_error
rangesetting
raw
raw_ir_reading.displayname
raw_ir_reading.value
raw_ir_reading_lpf.displayname
raw_ir_reading_lpf.value
rawsensedata
reactive energy export t1.unit
reactive energy export t1.value
reactive energy export t2.unit
reactive energy export t2.value
reactive energy import t1.unit
reactive energy import t1.value
reactive energy import t2.unit
reactive energy import t2.value
readings[].humidity
readings[].temperature
readings[].timestamp.day
readings[].timestamp.hours
readings[].timestamp.minutes
readings[].timestamp.month
readings[].timestamp.seconds
readings[].timestamp.year
reason
received_signal_strength
reference_completed
rejointrigger
relative_humidity
relative_humidity.displayname
relative_humidity.unit
relative_humidity.value
relativehumidity
relay_1
relay_2
relay_3
releasedate
releasenumber
remote
resendinterval
reset
reset0
reset_cause
resetcause
resetcounter
restarts
restorereportset
rgblum
rh
rms_voltage
ro1_status
ro2_status
roll
rpm
rssi
rssi1
rssi2
rssi3
rssi4
rssilimit
saqi
sats
sendcycle
sensor_data.acceleration_alarm
sensor_data.temperature
sensor_flag
sensor_id.displayname
sensor_id.value
sensor_model
sensor_temperature_internal.displayname
sensor_temperature_internal.unit
sensor_temperature_internal.value
sensoraggregate
sensordetectiontime
sensordisabletime
sensorreadperiod
sensortemperature
sensortype
sensorunit
serial-number.value
serial_nr
serial_number
settingsallowed
snr
so2
soil_moisture_at_depth_0.displayname
soil_moisture_at_depth_0.value
soil_moisture_at_depth_1.displayname
soil_moisture_at_depth_1.value
soil_moisture_at_depth_2.displayname
soil_moisture_at_depth_2.value
soil_moisture_at_depth_3.displayname
soil_moisture_at_depth_3.value
soil_moisture_at_depth_4.displayname
soil_moisture_at_depth_4.value
soil_moisture_at_depth_5.displayname
soil_moisture_at_depth_5.value
soil_moisture_at_depth_6.displayname
soil_moisture_at_depth_6.value
soil_moisture_at_depth_7.displayname
soil_moisture_at_depth_7.value
soil_temperature.displayname
soil_temperature.unit
soil_temperature.value
soil_temperature_at_depth_0.displayname
soil_temperature_at_depth_0.unit
soil_temperature_at_depth_0.value
soil_temperature_at_depth_1.displayname
soil_temperature_at_depth_1.unit
soil_temperature_at_depth_1.value
soil_temperature_at_depth_2.displayname
soil_temperature_at_depth_2.unit
soil_temperature_at_depth_2.value
soil_temperature_at_depth_3.displayname
soil_temperature_at_depth_3.unit
soil_temperature_at_depth_3.value
soil_temperature_at_depth_4.displayname
soil_temperature_at_depth_4.unit
soil_temperature_at_depth_4.value
soil_temperature_at_depth_5.displayname
soil_temperature_at_depth_5.unit
soil_temperature_at_depth_5.value
soil_temperature_at_depth_6.displayname
soil_temperature_at_depth_6.unit
soil_temperature_at_depth_6.value
soil_temperature_at_depth_7.displayname
soil_temperature_at_depth_7.unit
soil_temperature_at_depth_7.value
soildhumi5te
soildtemp5te
solar_radiation.displayname
solar_radiation.unit
solar_radiation.value
sound
soundavg
soundpeak
speed
speedkmph
state
status
status.displayname
status.value
status1
status2
status_1
status_2
status_calendar_event_list_state
status_correct_received_meter_files_counter
status_firmware_version
status_incorrect_received_meter_files_counter
status_last_sync_time
status_lorawan_activation_state
status_lorawan_configuration_state
status_network_time_state
status_obis_id_filter_list_state
status_reset_counter
status_system_time_state
status_time
status_uploaded_meter_data_messages_counter
statuschange
step_count
stop_timer
storage_fully_charged
storage_voltage
strain.displayname
strain.unit
strain.value
strain_gauge.displayname
strain_gauge.unit
strain_gauge.value
sub_band
surface_temperature.displayname
surface_temperature.unit
surface_temperature.value
sw_rev
sw_version_text
switchtype
swver
systemtimebit
systemvoltage
systimestamp
t
tamper
targettemperature
tbaro
tdc
tdew
tdewc
temp
temp1
temp2
temp_black
temp_channel1
temp_channel2
temp_red
temp_soil
temp_white
tempalarmlimithigh
tempalarmlimitlow
tempalarmsenabled
tempc
tempc1
tempc_ds
tempc_ds18b20
tempc_sht
tempchange
temperature
temperature.displayname
temperature.unit
temperature.value
temperature_electronics.displayname
temperature_electronics.unit
temperature_electronics.value
temperature_head.displayname
temperature_head.unit
temperature_head.value
temperature_pt1000.displayname
temperature_pt1000.unit
temperature_pt1000.value
temperature_target.displayname
temperature_target.unit
temperature_target.value
temperatureboard
temperaturechanged
temphumidchanged
templdo
tempmax
tempmin
tempntu
tempoffset
tempph
theatindex
theatindexc
threshold
tilt
tilt_alarm_threshold
tilt_enable
time
timeset
timestamp
timestamp.day
timestamp.hours
timestamp.minutes
timestamp.month
timestamp.seconds
timestamp.unit
timestamp.value
timestamp.year
timestatus
tob1
total_acquisition_time.displayname
total_acquisition_time.unit
total_acquisition_time.value
total_solar_radiation.displayname
total_solar_radiation.unit
total_solar_radiation.value
total_voc.displayname
total_voc.unit
total_voc.value
trapstate
trigger_mode
turbidity_in_fnu.displayname
turbidity_in_fnu.unit
turbidity_in_fnu.value
turbidity_in_mg_l.displayname
turbidity_in_mg_l.unit
turbidity_in_mg_l.value
turbidity_in_ntu.displayname
turbidity_in_ntu.unit
turbidity_in_ntu.value
tvoc
tvoc_ppb
tvoc_voc
tvoc_vocchange
txpower
type
typical_particle_size.displayname
typical_particle_size.unit
typical_particle_size.value
ultrasonic_range
um0_3
um0_5
um10
um1_0
um2_5
um5_0
unit
upm
uptime
us_sensor_count
user_mode
user_value
uuid
valid
valv
valve
valvestate
vapor_pressure.displayname
vapor_pressure.unit
vapor_pressure.value
vbat
vbus
vcc
velocityx
velocityy
velocityz
version
voc
vol
volt
voltage
voltage-transformer primary.value
voltage-transformer secondary.value
voltage.displayname
voltage.unit
voltage.value
volumetric_water_content.displayname
volumetric_water_content.unit
volumetric_water_content.value
vsys
vt
vwc
warningstatus
water_depth.displayname
water_depth.unit
water_depth.value
water_potential.displayname
water_potential.unit
water_potential.value
water_soil
waterleak
waterleak_1
waterleak_2
waterlevel
watertemp
web
weight.displayname
weight.unit
weight.value
wifi
wifi[].mac
wifi[].rssi
wifichancycle
wind_direction.displayname
wind_direction.unit
wind_direction.value
wind_speed.displayname
wind_speed.unit
wind_speed.value
winddirection
windspeed
wmbusfilterlist
wmbuspackagesreceived
wmbuspackagessaved
wmbuspackagessent
work_mode
workcount
workdurationtime
x
x_orientation_angle.displayname
x_orientation_angle.unit
x_orientation_angle.value
y
y_orientation_angle.displayname
y_orientation_angle.unit
y_orientation_angle.value
z

Few noticable things here:

Most codecs will benefit a lot from a standard format
Some send multiple readings; some in time (see readings[]), some on different levels (search for _at_depth_)
Some are aware of absolute time; I think this is a waste of payload. I think we should make the receive time available to the code so it can calculate the time (and apply a small offset that is sent in the payload). Still, if they really want, they can return the time
Some send min/max/average/median/percentiles of scalar readings. This aggregation on the edge is also a good practice
We may want to come up with some standard "status" too, even if it were just basic flags, like "low battery", "clock synchronized", etc. But we can do this later

If we want to support min/max/avg/median/percentiles for temperature... What do we do?

{
  "air": {
    "temperature": {
      "current": 20.5,
      "min": 19.2,
      "max": 20.6
    }
  }
}

Is this still developer friendly enough? Or

{
  "air": {
    "temperature": 20.2,
    "minTemperature": 19.2,
    "maxtemperature": 20.6
  }
}

I like the former one personally.

The plan to add a new message type here, and we already encourage integration developers to specify a path for that message to keep things separate:

That's what I thought, just another thing to keep in mind.

These are the current flattened output fields of all codecs that provide examples:

The list looks like a terrifying mess and illustrates perfectly why this issue it's important!

We may want to come up with some standard "status" too, even if it were just basic flags, like "low battery", "clock synchronized", etc. But we can do this later

This will indeed be necessary, there might be a lot of use cases that produce uplinks that do not adhere strictly to “a quantity with some units in a defined context”, for example: open/closed status, events from computer vision recognition on the edge, periodic beacons with some device state… For now, thought, I think we should focus on the easiest one to standardize, the physical quantities. Once we have everything in place, we can go for the more difficult ones to agree on.

If we want to support min/max/avg/median/percentiles for temperature... What do we do?
{
  "air": {
    "temperature": {
      "current": 20.5,
      "min": 19.2,
      "max": 20.6
    }
  }
}
Is this still developer friendly enough? Or
{
  "air": {
    "temperature": 20.2,
    "minTemperature": 19.2,
    "maxtemperature": 20.6
  }
}
I like the former one personally.

From a developer perspective, I think that once you have to go one level deep to get the value with the validation required, there's no difference going one or more levels. I like the first one too, it's easier to glance over it. The second one would make more sense to me if we were trying to avoid nesting at all costs for developer ergonomics. I mean, having just one level with airTemperature, airMinTemperature and airMaxTemperature, but that seems very fragile and could get messy pretty quickly.

To get a first JSON schema, I think we should focus on mapping some fields from that vast list into a nice table similar to the one in the first comment of this issue.

Yep, I think the first one is nicer too.

there might be a lot of use cases that produce uplinks that do not adhere strictly to “a quantity with some units on a defined context”, for example: open/closed status, events from computer vision recognition on the edge, periodic beacons with some device state… For now, thought, I think we should focus on the easiest one to standardized, the physical quantities. Once we have everything in place, we can go for the more difficult ones to agree on.

Yes, I think we should go in the direction of defining structures for things that can be controlled (valves, lights, doors) and things that cannot necessarily be measured in physical quantities but scores. But indeed, let's figure this out later.

With #508 merged, the next step is to define a next batch of fields. The original comment is a great start.

On one hand it's desirable to keep iterations big and avoid and pushing lots of incremental support to the device makers because they won't keep up with that. We also have to keep TTS and our documentation up-to-date so every schema addition comes with 3 public pull requests plus some TTI internal merges. On the other hand, we need to keep the pace, so big schema changes may take a long time to fully agree with and commit to.

I would suggest going with the low hanging fruit, which is basically what's in the aforementioned comment here, and then incrementally add more stuff as device makers and application developers start embracing it with open arms and tears of joy in their eyes.

@pablojimpas are you willing to spend some time on schema additions?

@pablojimpas are you willing to spend some time on schema additions?

Sure! Awesome work with your 3 PRs so far, I'm sure I can use those as a basis for implementing more measurements. If you don't mind, I'll start with those measurements that are more useful for agricultural use cases.

Let's see if we can cover a good number of variables before TTC2022 so that this new format can be promoted there to gain adoption easily.

I will start with a PR to include more air and soil quantities to lib/payload.json and then I'll figure out the new validation required in TTS. I don't feel confident enough to contribute to the documentation, but I think that a table or diagram explaining the standard would be a great addition.

Hi all. Device manufacturer here (KELLER Pressure). I am not sure if my input is welcome, but here are my two cents:

I really like the idea of the "Normalizer"
So far I haven't found a general standard for IoT or MQTT measurements. I also searched various opendata archives as well but could not find a common standard.
A common standard for measurements would be helpful for us when creating MQTT content. Whatever you guys will "normalize", I will probably use it for other projects.
When checking the validity, be aware that devices can send differences and the value can be <0. I am not aware of any sensors that send differences in humidity, pH or direction. But theoretically possible.

Here's a fabricated but possible example of a set of measurements from a LoRaWAN device equipped with some special pressure sensors:

{
  "P1": 1.011563763022423, 
  "PBaro": 0.9731699824333191,
  "Pd (P1-PBaro)": 0.0383937805891037,
  "TBaro": 21.219999313354492,
  "TOB1": 23.18115234375,
  "SDI12 CH1": 1.23456789,
  "Counter input": 4
}

"Counter input": There might be a 'rain catcher' or another device that counts impulses. The number is the count of impulses in a predefined time range. It is unit-less. "SDI12 CH1": We have various interfaces (SDI-12, RS485 or just voltage input) to which a customer can connect any sensor. We do not know which sensor and therefore treat this as unit-less. "TOB1": The temperature at the first sensor. Unit=°C "TBaro": The temperature in the sending device, which represents the air temperature (Although it's in the box, not in the air). Unit=°C "PBaro": The pressure in the sending device, which represents the barometric pressure (Although it's in the box, not in the air). Unit=bar "P1": The pressure from the level probe (in the water). Unit=bar "Pd (P1-PBaro)": The difference between the pressure from the probe and the barometric pressure that is used to calculate the water level. Unit=bar.

It is not necessary to send "Pd" if the two input values ("P1","PBaro") are also sent. However, customers often want both. It is possible to connect up to five pressure sensors. Together with the ones inside the sending device, this gives six pressure values (and six temperature values). Split into several transmissions.

How would this look like with the normalizer?

"uplink_normalized": {
    "pressure": [{
            "value": 973.16998243331, // normalized to hPa
            "sensor": "atmosphericPressure", // TTN normalized term
            "source": "PBaro" // manufactorer's funky name for this
        }, {
            "value": 1011.56376302242,
            "source": "P1"
        }, {   
            "value": 38.3937805891037,
            "source": "Pd (P1-PBaro)"
        }
    ],
    "temperature": [{
            "value": 21.219999313354,
            "sensor": "ambientTemperature" // (Actually, it is not the 'ambient', it's IN a box but close enough)
            "source": "TBaro"
        }, {
            "value": 23.18115234375,
            "source": "TOB1"
        }
    ],
    "quantityless": [{
            "value": 4,
            "source": "counter input"
        }, {
            "value": 1.23456789,
            "source": "SDI12 CH1"
        }
    ],
}

Thanks for your input @cBashTN

When checking the validity, be aware that devices can send differences and the value can be <0. I am not aware of any sensors that send differences in humidity, pH or direction. But theoretically possible.

What do you mean by differences? A delta w.r.t. a previous value? We can support that but only if we make things stateful. We have plans for that as well. I think the goal is to produce absolute values in normalized payload, even if the end device sends changes.

Regarding TOB1 and TBaro in your example; how should this be normalized? Is the former air temperature and the latter "device temperature" (i.e. in the case)?

Regarding unitless counters; this still means something in the domain, right? I mean, if it's raindrops, we can have air.rainDrops or something.

In case you are working with auxiliairy input, i.e. any external device that provides current and your device is sending the voltage level but doesn't really know what it is, then we also have to work with state. The idea is that we get some sort of installation state, per device, that is made available to the normalizer so it knows what the decoded payload means exactly.

I think the goal is to produce absolute values in normalized payload, even if the end device sends changes.

Right. We could devise some sort of flag indicating whether it is an absolute value or a delta, but that would add unnecessary complexity to integrators who want to benefit from the normalized payload. Since we already have to make the decoder/normalizer stateful to address the “installation state” issue, this rare case could also be implemented that way.

In the case of deltas between two values present in the same payload, I don't think it makes sense to handle this in either the decoder or the normalizer. The normalized payload will contain the two relevant values (e.g, T0B1 and TBaro) and then the integration will make use of them as it wants, calculating the difference or any other logic. In general, I think we should not send redundant data in the normalized payload nor values that can be calculated based on others.

Regarding unitless counters; this still means something in the domain, right? I mean, if it's raindrops, we can have air.rainDrops or something.

From my experience with rain gauges that work with pulses, each pulse corresponds to some millilitres of water. The conversion has to be provided by the manufacturer and present in the decoder/normalizer to come up with something that makes sense in the domain. We don't want any value without units and domain context in the normalized payload. The same is true for SDI-12, RS485 or other interfaces, that values have to be converted to something meaningful, regardless of whether the conversion is done stateless or stateful.

I have some questions about the current schema. Apologies if these are addressed above.

Should 'unit' (ie the SI unit) of things like temperature, speed, depth, etc be defined in those object blocks as a separate key rather than in the description. This would allow UIs to determine the unit.

"speed": {
    "type": "number",
    "description": "Speed",
    "unit": "m/s"
    "minimum": 0
}

Should the measurement types such as air/pressure only be allowed to reference existing entries above, or must have the unit defined within them? The way it is now where measurements can define their own unit values (or skip all those entries?) means it is simple to create a JSON object which does not carry useful information.
It feels like there is something missing between the definitions key and temperature etc. Further down there is a is measurements key that groups all the measurement objects, but there is not an equivalent for temperature etc. Is this worth doing so it is clear what the object is describing because they're all grouped under a key?

Should 'unit' (ie the SI unit) of things like temperature, speed, depth, etc be defined in those object blocks as a separate key rather than in the description. This would allow UIs to determine the unit.
"speed": {
    "type": "number",
    "description": "Speed",
    "unit": "m/s"
    "minimum": 0
}

Good question. We support JSON Schema and it only supports description for informative information.

2. Should the measurement types such as air/pressure only be allowed to reference existing entries above, or must have the unit defined within them? The way it is now where measurements can define their own unit values (or skip all those entries?) means it is simple to create a JSON object which does not carry useful information.

3. It feels like there is something missing between the definitions key and temperature etc. Further down there is a is measurements key that groups all the measurement objects, but there is not an equivalent for temperature etc. Is this worth doing so it is clear what the object is describing because they're all grouped under a key?

Hmm I don't understand these questions. Can you elaborate, maybe with an example?

TheThingsNetwork / lorawan-devices