Multiple sources for the same value

rob42 commented 9 years ago

It is quite possible for a key value to come from more than one device. eg position (lat/lon) could come from several gps enabled devices, and multiple depth sounders are not uncommon. We need a consistent way to handle this.

All the incoming values may well be valid in their own context, and it is feasible that all of them may be wanted, for instance, dispalying depth under each hull on a catamaran.

Hence discarding or averaging is not a solution, and since signalk is unable to derive the best way to handle multiple values it must always fall to a default action, with human over-ride when needed.

I propose we should simply store all the options in the tree, and have the main 'source' reference the options?

Then simple rules can apply: 1) If its the first value for the key, it becomes the first option and the default value. 2) If another value with different source arrives, we add to options - if its our preferred source (from persistent config) we auto-switch to it, otherwise we just record it 3) Users can then view the source options for a key and select from list for a specific display. They might select courseOverGroundTrue.options.ttyUSB1 as the source for a specific need or they may select an average/mean/min/max etc. (This facility is not part of the spec, but implementation specific) 4) maybe needs a rule to trim unreferenced options (or all options) on output.

{"vessels":
    {"self":
        {"navigation":
            {"courseOverGroundTrue":
                {
                 "value": 102.29,
                 "source": "options.actisense",
                 "options":
                    {
                    "ttyUSB1":  **just a unique name, have to generate somehow from incoming
                        {  
                            "value": 99.2900009155,
                            "source": "/dev/ttyUSB1"
                            "timestamp": "2014-08-15-16:00:00.081",
                        },
                     "actisense":
                        {
                            "value": 102.29,
                            "source": "/dev/actisense",

                            "timestamp": "2014-08-15-16:00:00.081",
                            "src": "115",
                            "pgn": "128267"
                        }
                    }
                }
            }
        }
    }
}

tkurki commented 9 years ago

Thanks @rob42 for kicking this off.

I'll pitch in with my TDD hat on and commit a test for the new format. I modified your example a little:

pgn was wrong
added complexity: two nmea0183 sources with different sentences and two n2k sources on one bus with different pgns

I think that the way the identifer is constructed should be specified:

n2k: producerid-pgn-sourceid (producer id from server configuration, others from n2k data)
nmea0183: producerid-talkerid-sentence (like n2k)

BTW we have neglected to include nmea0183 talker id, it should be added to the schema as well.

The next step would be to fix the schema so that the test passes for the sample. You can run the test after npm install with npm test.

PS. The test code assumes that the input is without the vessels.123456789 preamble, that ought to be fixed as well.

tkurki commented 9 years ago

I added Travis configuration for this and opened pull request #49 for this so now we have CI integration as well.

rob42 commented 9 years ago

Also need the extra rule to make it clear that we have options object: 4) If there is an 'options' object, then source = options.name Or we could always have 'options', but that would be verbose for most cases...

tkurki commented 9 years ago

Where should the additional documentation like rules 1-4 go?

rob42 commented 9 years ago

I would put it as a comment in relevant code, and I will write up a page for the web site under 'developers'. Wonder if I can add it to a readme.md in the specification repo?

tkurki commented 9 years ago

The test I mentioned earlier: https://github.com/SignalK/specification/blob/multiple-values/test/multiple_values.js

pod909 commented 9 years ago

The prefered source of data probably belongs in a manifest (seperate structure defining the resources available from a vessle). Such a manifest would list the readings available from a vessel, a list of devices and contain the preferred mapping between them. If a date stamp was also included that would remove the need to litter the basic data with souce and timing data

By declairing the devices (with readings available) in a manifest and specifying a suitable URI structure for SignalK you would give the user the choice to query the individual device for data using the existing SignalK schema with out the need to add additional "option" tags.

tkurki commented 9 years ago

We have previously discussed the need for a mapping layer in case there is need for server side processing of for example derived values. This would be part of the server configuration.

But I think that the simplest case should be supported with minimun a priori knowledge, eg. the server should just need minimum configuration to connect the producers on board (n2k bus, nmea sources, other sources) and then be able to pass that data on.

With the current implementations there is a de facto registry of all available data in the REST api contents.

There are some ideas about this type of metadata in https://github.com/SignalK/specification/blob/master/schemas/groups/sensors.json.

About timestamp: I think it belongs right next to the value as it is metadata about that particular value. It is also part of the streaming data: 'new value is x with timestamp y'.

tkurki commented 9 years ago

More stuff available at SignalK/signalk.github.io#17

pod909 commented 9 years ago

Appologies if this is going over old ground. I suspect I'm coming at this from a different angle..

Because this moves SignalK towards being essentially just a super transport of other messaging a user is going to need considerable prior knowlage of the exact configuration of the vessel -- including knowlage of whcih components are reliably calibrated and the way it's going the structure of the none SignalK messaging its self -- to use SignalK sucessfully.

Coming to the vessel from the outside with none of that knowlage a menu of possable readings for each value is going to be confusing at best. So either some work is done on the 'server' to pick the right source or there's a need for a manifest to tell me which is the best reading to use from the range provided.

Achitecture wise it's also pushing SignalK towards a particular and more expensive implementation model (all the work done at the end applicaiton) that doesn't allow for work to be distributed over the network. Where as a good scheme, or one that it's hoped will have a wide take up, should leave these architecture questions open. Otherwise there's a danger SignalK is going to end up tied to the implementation found here and go no further.

My interest is in using SignalK end-2-end from sensor to display via processing. Then from the boat to a nework of boats and to the internet. With a large number of derived values being generated in the chain, some of which go back in the opposit dirrection. With the right SignalK schema I'd be quite happy to translate to SignalK at the device where it's not implemented natively.

I'd like to see a single data structure to represent all the data available from a single location-device i.e. a single data structure that can be used to pass data from a device to intra location processing AND once processed out to the outside world. As start point the sensor data object should have the same structure as the vessel object, rather than the vessel object (which may not be my frame of reference for makign a query) containing all sensor values.

Intelegent multiplexing (with an explaination of how it was done and the alternative sources that can be seperatly RESTed if wanted) of the data enables me to expose my knowlage of the local system set-up in the data. The way you're taking this the use of SignalK from the device is prety unatractive and if I have to pay for N2K from device to multiplexer then I might as well stick with it e2e.

Time stamp

For me the timestamp should be abstracted to the sensor object with all the values coming from that sensior referencing the sensor (which I'd do in the manifest as the mapping is prety static). That way all the data can be left in the vessel object rather than being seperated out into different sensor payloads that I have to process back together again.

Timestamping for delta data belongs in the packet header no?

keesverruijt commented 9 years ago

@pod909, I can't really make out what you are saying. On an abstract level, and on a detailed technical level. Maybe it's just me though.

tkurki commented 9 years ago

@pod909 Maybe a real life practical example would help clarify your viewpoint?

As for "a menu of possable readings for each value is going to be confusing at best": on my boat the menu would contain items like

navigation.position
navigation.courseOverGroundTrue
navigation.speedThroughWater
navigation.speedOverGround
enviroment.wind.speedApparent
enviroment.wind.angleApparent
enviroment.depth.belowTransducer

Often there is only one source for each data item. All this can be gleaned from the delta messages in a minute or so or fetched from the server when the display starts.

I fail to see that list as confusing to pretty much anybody with a little navigation background.

pod909 commented 9 years ago

no problem with that list if the data was at that level but the there's alread value, source, timestamp below that level and the propsal is to add multiple option values as well isn't it?

tkurki commented 9 years ago

@pod909 The user interface does not need to show all the information that it has.

For example https://github.com/SignalK/instrumentpanel/ populates the grid as it receives new data items, creating cells when it sees new paths it hasn't seen before. Even if the information it receives has timestamp and source it only shows the name of data item, for exaple navigation.courseOverGround, and the value.

Please take also a look at Delta messages at http://signalk.org/dev/messageFormat.html

This addition to the spec is about having multiple values for a given data item, say navigation.courseOverGround, in the REST data model. Current delta message format already allows updates to one data item from multiple sources, for example from multiple GPS's.

rob42 commented 9 years ago

The handling of multiple values is superceeded by the method detailed in https://github.com/SignalK/specification/wiki/Multiple-Values-Handling, so closing this one

faceless2 commented 9 years ago

Adding multiple values as described at https://github.com/SignalK/specification/wiki/Multiple-Values-Handling doesn’t seem to work: here’s a section:

"self": {
      "navigation": {
        "courseOverGroundTrue": {
          "timestamp": "2014-08-15-16: 00: 01.083",
          "value": 102.29,
          "source": "vessels.self.sources.n2k.actisense-115-129026"
        }
      },
      "sources": {
        "n2k": {
          "actisense-115-129026": {
            "value": 102.29,
            "bus": "/dev/actisense",
            "timestamp": "2014-08-15-16: 00: 01.083",
            "src": "115",
            "pgn": "129026"
          },

The trouble is the value 102.29 in the “source” isn’t linked to the key “courseOverGroundTrue”. If that source also provided us with other values, I’d have no way of distinguishing them.

I presume this is a solved problem and that page just hasn’t been updated (and if by chance it’s not, changing the “value” in the source to a tree of values (eg : { “self”: { “navigation”: { “courseOverGroundTrue”: 102.29 } } }) would do it… although this doesn’t allow for different timestamps for different values)

(this is an edited repost of my comments on Slack here, for reference)

tkurki commented 9 years ago

I noticed the same thing a few days ago when I resurrected my work on this.

The problem isn't really solved, but imho the solution presented at the beginning of this issue is the best if not perfect option and I'm going to move in that direction in the node ref implementation.

The wiki needs updating, but I would like to verify the thinking with some working software before a real update. I'll throw in a label that the wiki version is not normative and has problems.

fabdrol commented 9 years ago

Just a thought; we could approach this differently:

we consider the value of a property to be the primary value. Which one to put there is inferred by server using the priority of the source (e.g. taken from sensors or sources group).
we add a optional key called alternatives or something similar which lists alternative values by source id, ordered by priority.

Example:

"01010101": {
  "navigation": {
    "courseOverGroundTrue": {
      "timestamp": "2015-08-25T08:12:17.824Z",
      "value": 102.29,
      "source": {
        "label": "aft compass",
        "type": "compass",
        "id": "sensor1"
      },
      "alternatives": {
        "sensor2": {
          "timestamp": "2015-08-25T08:12:17.824Z",
          "value": 100.11,
          "source": {
            "label": "center compass",
            "type": "compass",
            "id": "sensor2"
          }
        },
        "sensor2": {
          "timestamp": "2015-08-25T08:12:17.824Z",
          "value": 101.2,
          "source": {
            "label": "center compass",
            "type": "compass",
            "id": "sensor2"
          }
        }
      }   
    }
  }
}

A consumer could choose to simply use the primary value, or present a list of options to the user to switch to another value. The second source property in the alternatives might seem unnecessary, but I think it is important that the contents of each object in the alternatives has the same structure as the primary value since some values (e.g. position) follow a different structure (it has position.latitude, position.longitude and position.altitude instead of position.value).

faceless2 commented 9 years ago

JavaScript maps are unordered :-) But of course if you needed to order them you could add a priority field, or you could leave them unprioritised - realistically determining priority beyond the primary might not be much use: I have 5 GPS sources on my boat, I know which one I want to use, if that's not working, well I suppose it's whichever one has a signal. So long as it's consistent.

Personally, I quite like having the alternatives bundled off in a separate section of the datamodel. There are two reasons for this.

I haven't investigated the syntax for subscription yet, but if I subscribe to navigation.courseOverGround I probably don't care that about the 8 alternatives, their timestamps and labels. It's easier to subscribe to navigation/* than navigation/*[not(options)], or however it is the syntax goes - my xpath is a bit rusty, but you get the idea.
Sources will typically provide more than one data value, which means if you "interleave" the various options for each data value into the main model, you're going to repeat a lot of information unless you have some sort of sources object and refer to the source by ID. You also won't be able to easily identify all values from a particular source.

I can think of three broad ways to handle this from a datamodel point of view, I'll sketch them out here. All the models below present the same information - course and speed from two different local sources (I'm assuming that sources for data can come from other vessels - kind of the point of SignalK after all - so sources are under vessels.self as in the original example in this issue).

Option 1

First, the "primary values only stored in the main datamodel, alternatives kept with their sources" option. This is the one I'd advocate. It's also only a slight variation on the model described on the wiki.

{
    "navigation": {
        "courseOverGround": {
            "value": 109.2,
            "timestamp": "2015-08-25T08:12:17.824Z",
            "source": "vessels.self.dev1"
        },
        "speedOverGround": {
            "value": 4.2,
            "timestamp": "2015-08-25T08:12:17.824Z",
            "source": "vessels.self.dev1"
        }
    },
    "sources": {
        "vessels": {
            "self": {
                "dev1": {
                    "label": "Masthead GPS",
                    "timestamp": "2015-08-25T08:12:17.824Z",
                    "bus": "n2k:/dev/actisense",
                    "values": {
                        "navigation": {
                            "courseOverGround": 109.2,
                            "speedOverGround": 4.2
                        }
                    }
                },
                "dev2": {
                    "label": "VHF Internal GPS",
                    "timestamp": "2015-08-25T08:12:17.824Z",
                    "bus": "nmea1083:/dev/ttyUSB4",
                    "values": {
                        "navigation": {
                            "courseOverGround": 108.7,
                            "speedOverGround": 4.3
                        }
                    }
                }
            }
        }
    }
}

Advantages: primary data (eg navigation, environment) is kept concise (which should simplify subscribing to just the values and not the alternatives). It's easy to determine all data from a particular source should you wish to. Disadvantages, there's no easy way to reflect two updates from an individual source with different timestamps (e.g if GPS position was updated once a second, but course/speed only every five seconds). I'm unsure if this matters.

Option 2

The model describe by fabdroi above, where the options are included inline with the primary datamodel. I've added a primary key to each option to indicate which value was chosen, but you could indicate the chosen alternative in any number of ways.

{
    "navigation": {
        "courseOverGround": {
            "value": 109.2,
            "sources": {
                "vessels": {
                    "self": {
                        "dev1": {
                            "label": "Masthead GPS",
                            "timestamp": "2015-08-25T08:12:17.824Z",
                            "bus": "n2k:/dev/actisense",
                            "value": 109.2,
                            "primary": true
                        },
                        "dev2": {
                            "label": "VHF Internal GPS",
                            "timestamp": "2015-08-25T08:12:17.824Z",
                            "bus": "nmea0183:/dev/ttyUSB4",
                            "value": 108.7
                        }
                    }
                }
            }
        },
        "speedOverGround": {
            "value": 4.2,
            "options": {
                "vessels": {
                    "self": {
                        "dev1": {
                            "label": "Masthead GPS",
                            "timestamp": "2015-08-25T08:12:17.824Z",
                            "bus": "n2k:/dev/actisense",
                            "value": 4.2,
                            "primary": true
                        },
                        "dev2": {
                            "label": "VHF Internal GPS",
                            "timestamp": "2015-08-25T08:12:17.824Z",
                            "bus": "nmea0183:/dev/ttyUSB4",
                            "value": 4.3
                        }
                    }
                }
            }
        }
    }
}

Advantages: there is no separate sources object, everything is in the one place. Disadvantages: everything is in the one place, and data must be repeated.

Option 3

A halfway-house, where the option values are stored in the primary datamodel, but the details on the various sources are stored elsewhere:

{
    "navigation": {
        "courseOverGround": {
            "value": 109.2,
            "source": "vessels.self.dev1",
            "timestamp": "2015-08-25T08:12:17.824Z",
            "options": {
                "vessels": {
                    "self": {
                        "dev1": {
                            "value": 109.2,
                            "timestamp": "2015-08-25T08:12:17.824Z"
                        },
                        "dev2": {
                            "value": 108.7,
                            "timestamp": "2015-08-25T08:12:17.824Z"
                        }
                    }
                }
            }
        },
        "speedOverGround": {
            "value": 4.2,
            "source": "vessels.self.dev1",
            "options": {
                "vessels": {
                    "self": {
                        "dev1": {
                            "value": 4.2,
                            "timestamp": "2015-08-25T08:12:17.824Z"
                        },
                        "dev2": {
                            "value": 4.3,
                            "timestamp": "2015-08-25T08:12:17.824Z"
                        }
                    }
                }
            }
        }
    },
    "sources": {
        "vessels": {
            "self": {
                "dev1": {
                    "label": "Masthead GPS",
                    "timestamp": "2015-08-25T08:12:17.824Z",
                    "bus": "n2k:/dev/actisense"
                },
                "dev2": {
                    "label": "VHF Internal GPS",
                    "timestamp": "2015-08-25T08:12:17.824Z",
                    "bus": "nmea1083:/dev/ttyUSB4"
                }
            }
        }
    }
}

Advantages: data is not repeated, all alternative values are in the main section of the datamodel. and one source can set data values with different timestamps. Disadvantages: not as brief as option 1, syntax for subscribing to just the primary values is not obvious, and you still can't identify all values from a particular source.

tkurki commented 9 years ago

Thanks for the input. You raised an issue at least I haven't thought about before: there is currently no way to subscribe to values from a specific source.

tkurki commented 9 years ago

Imho the values should be available logically near each other.

In your first example: how you do discover the alternative values for a data item without going through all the sources - and I assume that the sources would have all the sources on the vessel, so you would essentially have a list of all the values in a rather awkward structure.

In option 2/3 I don't get the point for the vessels.selfpart under sources (btw one is sources, other is options?). Skipping vessels.self down there it is pretty much the same as in the beginning of this thread.

The sources section in 3 is not very informative - what does the timestamp stand for there, since there is no value present? Last update - but what update?

tkurki commented 9 years ago

I think we should also think about this as a REST api and the urls.

.../navigation/speedThroughWater should give me "best" value and metadata about it (like that there are optional/alternative values). .../navigation/speedThroughWater/options should give me all the values and metadata about how the server considers the different options. The response should also give me a way to address a specific value in case I want just that one.

Both responses should point to other resources, like sensors and say their related calibration values, if it is not logical or convenient to represent that data as part of this resource.

faceless2 commented 9 years ago

Yes, you're right - in option one, there's no easy way to identify the list of alternatives. I've been thinking about it from a server POV until now - I have the alternatives available, I want somewhere to store them in the model. But I have to admit I'm still not clear why a client would ever need to use an alternative value rather than the specified primary value. The only example I recall where alternatives would be of use was two depth sounders, one under each hull of a cat. But for this, and all other cases I can conceive of, the client knows exactly what field it's requesting, and from what sensor it's requesting it
The explicit reference to vessels.self is in all of the options because a) I wanted to make sure they all represented exactly the same data, and b) I've presumed that it's possible (in theory at least) to get data values from another vessel - e.g. I sail past a boat/buoy/marina that's broadcasting a SignalK model and has a barometer, and I do not - now I can incorporate their data. If this isn't needed then it can be dropped, of course. It was "sources" in the original example and "options" in fabdroi's example - wasn't worried about that as these were just sketching out ideas for structure, not intended as final suggestions.
Yes you're right about timestamp in option 3, pointless - should be dropped.
I'm not familiar with the API for subscribing to parts of the model - I've presumed there is one and that 99% of the time you won't care about the options or where it came from, just the primary value for that field. If that's not done yet, then yes I'd agree it probably needs fleshing out before this goes much further - perhaps with something like json-path (or some other xpath variation for JSON) rather than reinventing the wheel.

tkurki commented 9 years ago

The assumption that there is always one primary value for each data item is not valid.

A Signal K consuming client, like some kind of electronic dashboard, must be able to provide a list of the alternative values. After that it may know exactly what it wants, but discovery and exploration of the available items should be easy.

For example in a case of malfunction (for example a stuck log wheel) you may want to switch to another source.

Imho data from another entity should not be present under vessels.self.

We need to support the use case of a gauge that wants to subscribe to updates from the port hull sounder.

See http://signalk.org/developers/subscription_protocol.html

fabdrol commented 9 years ago

If we drop the assumption that there always is a primary data source, we could simply drop the idea of having to specify a "primary" or a priority. That would make the schema simpler:

1. one sensor, one value, indicated by using value:

{
  "navigation": {
    "speedThroughWater": {
      "timestamp": "2015-08-25T08:12:17.824Z",
      "value": 9.5,
      "source": {
        "label": "Airmar DST-800",
        "uuid": "22E63B84-749D-489B-A872-8B0D62507C54",
        "type": "NMEA0183",
        "capabilities": ["DBT", "DPT", "VHW", "VLW", "MTW"]
      }
    }
  }
}

2. Multiple sensors, multiple values. Consumers are responsible for choosing one, either by UI or (if a simpler consumer) simply selecting the first one:

{
  "environmental": {
    "depth": {
      "belowTransducer": {
        "values": {
          "starboard-hull": {
            "timestamp": "2015-08-25T08:12:17.824Z",
            "value": 9.5,
            "source": {
              "label": "Starboard DST-800",
              "uuid": "9801FBF0-76A3-47DD-ACA6-D76DC405F95C",
              "type": "NMEA2000",
              "capabilities": [59392, 600928, 126208, 126464, 126464, 126996, 128259, 128267, 128275, 130310, 130311, 130312]
            }
          },

          "port-hull": {
            "timestamp": "2015-08-25T08:12:17.824Z",
            "value": 6.5,
            "source": {
              "label": "Port DST-800",
              "uuid": "C7B2F2B7-2D49-475A-B0A1-F1E7E840607D",
              "type": "NMEA0183",
              "capabilities": ["DBT", "DPT", "VHW", "VLW", "MTW"]
            }
          }
        }
      }
    }
  }
}

3. One could even get rid of the two different notations and simply have a values map on every data property. With one source, this map contains just one value, otherwise more than one:

{
  "environmental": {
    "depth": {
      "belowTransducer": {
        "values": {
          "dst800": {
            "timestamp": "2015-08-25T08:12:17.824Z",
            "value": 9.5,
            "source": {
              "label": "Starboard DST-800",
              "uuid": "9801FBF0-76A3-47DD-ACA6-D76DC405F95C",
              "type": "NMEA2000",
              "capabilities": [59392, 600928, 126208, 126464, 126464, 126996, 128259, 128267, 128275, 130310, 130311, 130312]
            }
          }
        }
      }
    }
  }
}

In case of option 2/3 and subscriptions, if a user subscribes to ~/environmental/depth/belowTransducer the user simply receives updates for both values (not that different from when a user subscribes to a higher up path like ~/environmental/depth, where he would receive updates for belowTransducer, belowKeel etc). REST is not an issue either: when requesting /signalk/v1/api/self/environmental/depth/belowTransducer the user simply receives the contents of the values map:

{
  "values": {
    "starboard-hull": {
      "timestamp": "2015-08-25T08:12:17.824Z",
      "value": 9.5,
      "source": {
        "label": "Airmar DST-800",
        "uuid": "22E63B84-749D-489B-A872-8B0D62507C54",
        "type": "NMEA0183",
        "capabilities": ["DBT", "DPT", "VHW", "VLW", "MTW"]
      }
    },

    "port-hull": {
      "timestamp": "2015-08-25T08:12:17.824Z",
      "value": 6.5,
      "source": {
        "label": "Airmar DST-800",
        "uuid": "22E63B84-749D-489B-A872-8B0D62507C54",
        "type": "NMEA0183",
        "capabilities": ["DBT", "DPT", "VHW", "VLW", "MTW"]
      }
    }
  }
}

I did include timestamps everywhere as a consumer could use that information for determining what value to use. For instance, a research vessel might have many identical sensors, each transmitting at 1hz, running a consumer that logs this into a database. The consumers will simply pick the latest value in order to get more data points than once a second.

rob42 commented 9 years ago

SInce a single source (like GPS RMC message) can have several diverse values, we should support the idea of the source being a reference to more data. So we can use the same rule as elsewhere, if source is a string its a reference to another location, if its an object it holds the source data directly. Then we can benefit from a many-to-one type of simplification.

{
    "environmental": {
        "depth": {
            "belowTransducer": {
                "timestamp": "2015-08-25T08:12:17.824Z",
                "value": 6.5,
                "source": "sources.n2k.some-unique-name",
                "values": {
                    "dst800": {
                        "timestamp": "2015-08-25T08:12:17.824Z",
                        "value": 9.5,
                        "source": "sources.n2k.some-other-unique-name"
                    }
                }
            }
        }
    },
    "sources": {
        "n2k": {
            "some-unique-name": {
                "label": "StarboardDST-800",
                "uuid": "9801FBF0-76A3-47DD-ACA6-D76DC405F95C",
                "type": "NMEA2000",
                "capabilities": [....]
            },
            "some-other-unique-name": {
                "label": "PortDST-800",
                "uuid": "7701FBF0-5546-47DD-ACA6-D76DC405F95C",
                "type": "NMEA2000",
                "capabilities": [....]
            }
        }
    }
}

That also means we can store config and lots of misc data like version, manufacturer etc without polluting the model. And its still easily available when needed.

In practice I think that multiple values will be the minority, but still important. So I like the idea of a simple value key for a simple value.

For the multple values I think we can then use values is a map of alternative values. This means they are discoverable, and using source refs, not too verbose. So the rules: 1) if there is a value key, thats the only or 'official' value. 2) If there is a values key, that holds alternatives 3) If there is no value key, pick your choice from values

faceless2 commented 9 years ago

@tkurki , your point about vessels.self has just dawned on me - I'd overlooked that the "navigation" entry is already under vessels.self.
@fabdrol, I'd strongly suggest that the server needs to be able to specify a primary value for some fields. I have two electronic compasses, one is +/- about 5 degrees, one is much more accurate. And one of my water temperature sensors keeps adding about 5° to the ambient temperature - sadly, I have to prefer the other one. The server can be told this in its configuration, the client can't, so I don't want to rely on whichever one happens to be parsed first in the data structure.

Remember, the implied order that you see when the data is serialised is not necessarily the order the keys will be stored in the client environment. The order is not only undefined, it's unstable - add another key to the map and the order may change, which means a display that just chooses the "first" value may oscillate unpredictable between multiple sources.
I really like the "value" vs "values" idea (and accompanying example) - I think it covers all possibilities without being too verbose. The only other thing I'd suggest is as well as (or instead of) "capabilities", which is hardware/source/bus-dependent, perhaps a "provides" key to identify the keys in the model that this source has provided values for - e.g. provides: ["environment.depth.belowTransducer"]. This would allow you to find all the values in the model provided by a particular source without having to walk the entire tree (aside: I see arrays have been downgraded from evil to tolerated...)

fabdrol commented 9 years ago

@rob42 I like your approach, maybe a small change: there should always be a value key unless we drop the idea that some sources are more important than others (which, as @faceless2 rightly points out, not something we should want).

So, this changes the rules a little bit: 1) if there is a value key, thats the primary value. A consumer can always rely on this key being present.
2) If there are more than one values, there is a values key with alternatives. The values key holds all alternatives including the one in value (a consumer can identify which by source). 3) source definitions will be changed to a string, which references a source object in a sources list by JSONPath (?)

Your example would then look like this:

{
    "environmental": {
        "depth": {
            "belowTransducer": {
                "timestamp": "2015-08-25T08:12:17.824Z",
                "value": 6.5,
                "source": "~/sources/port-dst800",
                "values": {
                    "port-dst800": {
                        "timestamp": "2015-08-25T08:12:17.824Z",
                        "value": 6.5,
                        "source": "~/sources/port-dst800"
                    },
                    "starboard-dst800": {
                        "timestamp": "2015-08-25T08:12:17.824Z",
                        "value": 9.5,
                        "source": "~/sources/starboard-dst800"
                    }
                }
            }
        }
    },
    "sources": {
        "port-dst800": {
            "label": "Starboard DST-800",
            "type": "NMEA2000",
            "provides": [....],
            // etc
        },
        "starboard-dst800": {
            "label": "Port DST-800",
            "type": "NMEA2000",
            "provides": [....],
            // etc
        }
    }
}

I made some small changes to the sources list, but that's a discussion for a different issue. Just for clarification: I removed UUID, if we use a unique key and JSONPath this is not required anymore; I copied @faceless2 idea to drop "capabilities" for "provides" (with a list of paths). Much more useful information, let the parsers worry about N2K and NMEA0183. The "type" might not even be required. Finally I removed the "category" (n2k parent), as SK shouldn't really care where data is coming from IMHO.

fabdrol commented 9 years ago

P.s.: I think JSONPath is more appropriate than using dot-notation, as JSONPath is standardised and there are many libraries available for many languages.. making it an easy way to work with nested JSON data.

timmathews commented 9 years ago

So much going on here. I think we're quite close to an appropriate solution, allow me to throw my hat in the ring.

Assumptions/Requirements:

A user may have multiple sources for the same data.
A user may order those sources by preference on some external criteria (e.g. accuracy, refresh rate, etc.)
A user may wish to compare two or more sources, therefore all values must be present in the model and available to consumers.

Unaddressed:

What happens when the currently consumed source for a datum goes offline when there are multiple sources for that datum? Is this handled server-side or client-side?

Given the above, I propose the format below. It addresses the need for multiple sources for the same data, presenting a preferred source to the clients along with a list of alternate sources, an ordered ranking of the sources, references to the physical device which provides the data and references from the physical device to the node in the model where its values can be found.

I disagree with the idea that there should be a top level value field. It duplicates data which is never a good thing and adds a burden to the server to generate two keys for the same value.

currentSource tells the consumer which value it should use for the datum. It serves the same purpose here as self does at the top of the model.

sourceRank is the "order of succession." If sourceRank[0] is not available, use sourceRank[1] and so on.

I'm not 100% on the source names, but they're sufficient for this example. I do agree with @fabdrol that we don't need the hierarchy in sources.

$source is a pointer to the source device which generates the datum.

I'm also not 100% on how the keys in provides are named. If we use a path, how much of the path do we need to use?

$ref is a pointer back to the place in the model where the value for a specific datum is provided.

Note that I use $ to identify keys which are pointers to other parts of the model. This is consistent with the usage in JSON-schema and a lot of libraries which marshal to JSON.

One final point, since we're using a NMEA2000 DST800 as our example, it provides water temperature via three different PGNs. Now, two of those are deprecated and may be turned off at the device, I have included them as an array passed to pgn to indicate that the water temperature could come from any of them (it makes no difference which, they'll always be the same value).

{
  "environmental": {
    "depth": {
      "belowTransducer": {
        "currentSource": "port-dst800",
        "sourceRank": ["port-dst800", "starboard-dst800"],
        "values": {
          "port-dst800": {
            "timestamp": "2015-08-25T08:12:17.824Z",
            "value": 6.5,
            "$source": "~/sources/port-dst800"
          },
          "starboard-dst800": {
            "timestamp": "2015-08-25T08:12:17.824Z",
            "value": 9.5,
            "$source": "~/sources/starboard-dst800"
          }
        }
      }
    }
  },
  "sources": {
    "port-dst800": {
      "label": "Port DST-800",
      "type": "NMEA2000",
      "provides": {
        "speed/waterReferenced": {
          "$ref": "~/environment/speed/values/port-dst800",
          "pgn": 128259
        },
        "depth/belowTransducer": {
          "$ref": "~/environment/depth/values/port-dst800",
          "pgn": 128267
        },
        "waterTemp": {
          "$ref": "~/environment/waterTemp/values/port-dst800",
          "pgn": [130310, 130311, 130312]
        },
      }
    },
    "starboard-dst800": {
      "label": "Starboard DST-800",
      "type": "NMEA2000",
      "provides": {}
    }
  }
}

BTW @fabdrol, JSONPath uses dot notation (http://goessner.net/articles/JsonPath/). Unfortunately, it doesn't have the concept of ~, that's an extension on our part to mean $.vessels[self] in JSONPath notation.

That said, I prefer / to . because it translates well to URLs which we can use with a HTTP-based API. However, I am fine with using either.

fabdrol commented 9 years ago

@timmathews my mistake, I was confused and my comment was misguided. Sorry @rob42!

Tim, I like your approach. My two comments would be: (1) lose the currentSource and simply use the sourceRanking, where sourceRanking[0] always is the current source (that way, servers only need to implement ways to change the source ranking, and they don't need to worry about updating another value based on that). And (2) lose the pgn stuff inside capabilities. I really do not see the use case in storing that in the tree, as the translation is already handled somewhere upstream by a parser. But, that last bit is a different discussion in an issue about the sources group.

pod909 commented 9 years ago

By what mechanism does the publisher tell the consumer about the accuracy of the readings available in the consolidated SignalK message, allowing them to make a decision on prioritizing consumption?

e.g. there may be 4 values for heading, only 1 of which may be accurately calibrated

On 1 September 2015 at 16:53, Tim Mathews notifications@github.com wrote:

So much going on here. I think we're quite close to an appropriate solution, allow me to throw my hat in the ring.

Assumptions/Requirements:

A user may have multiple sources for the same data.

A user may order those sources by preference on some external criteria (e.g. accuracy, refresh rate, etc.)

A user may wish to compare two or more sources, therefore all values must be present in the model and available to consumers.

Unaddressed:

What happens when the currently consumed source for a datum goes offline when there are multiple sources for that datum? Is this handled server-side or client-side?

Given the above, I propose the format below. It addresses the need for multiple sources for the same data, presenting a preferred source to the clients along with a list of alternate sources, an ordered ranking of the sources, references to the physical device which provides the data and references from the physical device to the node in the model where its values can be found.

I disagree with the idea that there should be a top level value field. It duplicates data which is never a good thing and adds a burden to the server to generate two keys for the same value.

currentSource tells the consumer which value it should use for the datum. It serves the same purpose here as self does at the top of the model.

sourceRank is the "order of succession." If sourceRank[0] is not available, use sourceRank[1] and so on.

I'm not 100% on the source names, but they're sufficient for this example. I do agree with @fabdrol https://github.com/fabdrol that we don't need the hierarchy in sources.

$source is a pointer to the source device which generates the datum.

I'm also not 100% on how the keys in provides are named. If we use a path, how much of the path do we need to use?

$ref is a pointer back to the place in the model where the value for a specific datum is provided.

Note that I use $ to identify keys which are pointers to other parts of the model. This is consistent with the usage in JSON-schema and a lot of libraries which marshal to JSON.

One final point, since we're using a NMEA2000 DST800 as our example, it provides water temperature via three different PGNs. Now, two of those are deprecated and may be turned off at the device, I have included them as an array passed to pgn to indicate that the water temperature could come from any of them (it makes no difference which, they'll always be the same value).

{ "environmental": { "depth": { "belowTransducer": { "currentSource": "port-dst800", "sourceRank": ["port-dst800", "starboard-dst800"], "values": { "port-dst800": { "timestamp": "2015-08-25T08:12:17.824Z", "value": 6.5, "$source": "~/sources/port-dst800" }, "starboard-dst800": { "timestamp": "2015-08-25T08:12:17.824Z", "value": 9.5, "$source": "~/sources/starboard-dst800" } } } } }, "sources": { "port-dst800": { "label": "Port DST-800", "type": "NMEA2000", "provides": { "speed/waterReferenced": { "$ref": "~/environment/speed/values/port-dst800", "pgn": 128259 }, "depth/belowTransducer": { "$ref": "~/environment/depth/values/port-dst800", "pgn": 128267 }, "waterTemp": { "$ref": "~/environment/waterTemp/values/port-dst800", "pgn": [130310, 130311, 130312] }, } }, "starboard-dst800": { "label": "Starboard DST-800", "type": "NMEA2000", "provides": {} } } }

BTW @fabdrol https://github.com/fabdrol, JSONPath uses dot notation ( http://goessner.net/articles/JsonPath/). Unfortunately, it doesn't have the concept of ~, that's an extension on our part to mean $.vessels[self] in JSONPath notation.

That said, I prefer / to . because it translates well to URLs which we can use with a HTTP-based API. However, I am fine with using either.

— Reply to this email directly or view it on GitHub https://github.com/SignalK/specification/issues/48#issuecomment-136767969 .

tkurki commented 9 years ago

@timmathews @fabdrol If we lose value and just use values why not use an array instead of a map with non-functional keys (eg. no special functionality really attached to them) plus a separate order-indicating array?

If the primary value is always the first one then why not always use an array for the value? Even with just source the item the value would be values[0], `values[0].value for primitive values.

If we go with sources as refs then that should be universal, not related or driven by multiple values.

faceless2 commented 9 years ago

I was about to reply on the same lines - so it would look like this then?

"environmental": {
    "depth": {
      "belowTransducer": {
        "values": [
          {
            "timestamp": "2015-08-25T08:12:17.824Z",
            "value": 6.5,
            "$source": "~/sources/port-dst800"
          },{
            "timestamp": "2015-08-25T08:12:17.824Z",
            "value": 9.5,
            "$source": "~/sources/starboard-dst800"
          }
        ]
      }
    }
  },

This also has the advantage that you can get the current value for any field in a single query, rather than having to identify the currentSource, then get the value for that source.

tkurki commented 9 years ago

What happens when the currently consumed source for a datum goes offline when there are multiple sources for that datum? Is this handled server-side or client-side?

We are talking about a snapshot of latest values here. With just a single source the question is "how long do we keep serving it". With multiple sources the second question is "how do we order the values considering also their freshness".

I don't think the spec should specify either, as this depends on the item in question and user requirements.

tkurki commented 9 years ago

By what mechanism does the publisher tell the consumer about the accuracy of the readings available in the consolidated SignalK message, allowing them to make a decision on prioritizing consumption?

That information should go to the sources section if the accuracy depends on the source. If there is a definite accuracy related to a reading, like in gps position, that should find it's place next to the value.

timmathews commented 9 years ago

Oooh, I really like @faceless2's suggestion, the only issue I see with using an array like that is that it becomes difficult to map in the $ref field of a source:

"$ref": "~/environment/speed/values/port-dst800"

no longer exists and

"$ref": "~/environment/speed/values[0]"

doesn't really work because the order in the array may change.

We would need to use JSONPath style selectors:

"$ref": "~/environment/speed/values[?(@.$source=='~/sources/port-dst800')]"

rob42 commented 9 years ago

Arrays are still evil! Using arrays means the data value no longer has an immutable url. eg if a new depth sensor is added and the first removed the array order is different, and a query will suddenly get port depth instead of stbd depth ;-(

Better that the original fails, and some action is required to switch, allowing for notifications etc.

But otherwise its looking good. Using value and values just makes the simple case (the most common) easier. Saves a values structure for many misc keys that will never have duplicates

Does this work for incoming deltas? Can we still process them efficiently?

Also the fail-over and priority of alternatives is not a spec thing, Its a server implementation thing.

+1 provides

I like the idea of sources broken down to bus or something. Just aware that we dont really want 10,000 in one level, by bus would auto-limit it to the bus limit - often ~256. But the sources structure desnt need to be formally defined (just the source object). Since its going to be $ref'd anyway, or just searched.

timmathews commented 9 years ago

But in this case the array order should be able to change. It specifies precedence.

pod909 commented 9 years ago

The array also means that it has to be scanned to find the value for a particular sensor.

If information on the quality of the value is in the sources there's no need to order the values. For my 2c the values should be keyed on source. The consumer can then do an infrequent assessment using the sources to work out which one to use for each attribute and then use that to do the value look-up on an ongoing basis.

Ric

On 1 September 2015 at 23:52, Tim Mathews notifications@github.com wrote:

But in this case the array order should be able to change. It specifies precedence.

— Reply to this email directly or view it on GitHub https://github.com/SignalK/specification/issues/48#issuecomment-136885163 .

tkurki commented 9 years ago

Yep, arrays are still evil.

fabdrol commented 9 years ago

I prefer maps over arrays too. The only real downside to that is that we can't specify the order in the map, but that is easily solved by a "sourceRank" array as proposed by Tim

pod909 commented 9 years ago

or under each source in sources there is a list of the values available containing a rank or information such as accuracy and frequency to allow the consumer to make the choice for them selves.

the source Rank suggestion adds infrequently changing data load to the part of the JSON that will be consumed at high frequency

Ric

On 2 September 2015 at 06:41, Fabian Tollenaar notifications@github.com wrote:

I prefer maps over arrays too. The only real downside to that is that we can't specify the order in the map, but that is easily solved by a "sourceRank" array as proposed by Tim

— Reply to this email directly or view it on GitHub https://github.com/SignalK/specification/issues/48#issuecomment-136942029 .

fabdrol commented 9 years ago

@pod909 that is true, but putting that information in sources has a major downside too as a consumer would have to go through every source providing a certain value in order to determine the priorities

pod909 commented 9 years ago

True but they only have to do it once. I'm guesssing for many implementaions manually as well.

Ric On 2 Sep 2015 14:57, "Fabian Tollenaar" notifications@github.com wrote:

@pod909 https://github.com/pod909 that is true, but putting that information in sources has a major downside too as a consumer would have to go through every source providing a certain value in order to determine the priorities

— Reply to this email directly or view it on GitHub https://github.com/SignalK/specification/issues/48#issuecomment-137090241 .

fabdrol commented 9 years ago

That's not always the case. Consider @rob42's catamaran; he might build a module for his server that switches the primary echo sounder between port and starboard depending on which hull is leeward at a given time (on sail). The consumer would have to check the order more every time it looks up the data...

Besides, I'd argue that the amount of data isn't an issue over WS or HTTP. The consumer simply parses the JSON, looks up the source key from the source order array and fetches the value from the right source, something like this. It's much less work than also fetching the sources list all the time.

let data = vessel.navigation.depth.belowTransducer;
updateGauge(data.values[data.sourceOrder[0]].value);

pod909 commented 9 years ago

True but that's detail that can be discovered in the data at present under any of the proposals (?) On 2 Sep 2015 15:03, "Fabian Tollenaar" notifications@github.com wrote:

That's not always the case. Consider @rob42 https://github.com/rob42's catamaran; he might build a module for his server that switches the primary echo sounder between port and starboard depending on which hull is leeward at a given time (on sail).

— Reply to this email directly or view it on GitHub https://github.com/SignalK/specification/issues/48#issuecomment-137092494 .

tkurki commented 8 years ago

https://github.com/SignalK/signalk-server-node/tree/multiple-values now has work in progress of handling multiple values. So far it is minimum effort, eg. minimum changes from the current master and I wanted to see how things actually work if you use just the delta information to construct the tree.

Works only for n2k data so far. At this stage the policy for distinguishing sources just compares label (currently empty), src and pgn. As there is no way to query the N2K bus for further information src is used. This version does not construct sensors subtree at all, it just creates the value tree.

Value/values logic is that the most recent data is duplicated in value and all the data is under values.

    "navigation": {
        "speedThroughWater": {
            "source": {
                "label": "",
                "type": "NMEA2000",
                "pgn": 128259,
                "src": "160"
            },
            "values": {
                "-115-128259": {
                    "value": 3.47,
                    "source": {
                        "label": "",
                        "type": "NMEA2000",
                        "pgn": 128259,
                        "src": "115"
                    }
                },
                "-160-128259": {
                    "source": {
                        "label": "",
                        "type": "NMEA2000",
                        "pgn": 128259,
                        "src": "160"
                    }
                }
            }
        }

fabdrol commented 8 years ago

Looks good to me, three small points:

Shouldn't each value field have it's own timestamp, allowing consumers to discard a value if it's too stale from their perspective (putting that decision in the consumer, not the server)?
Would we still have "singular" value if there is just one source, or always go with the "values" format?
I would propose adding a latest/newest key (or something like that) somewhere, which has the name of the source/value updated last as value. That way consumers can retrieve the least stale value very easily without having to check the timestamp of each.

tkurki commented 8 years ago

I'll fix timestamps
with just one value there is just value, no values
value holds the latest value, as there is no other prioritisation. Not really a schema issue, more of a policy decision for the server implementation
schema changes not started yet

SignalK / specification