SignalK / specification

Signal K is a JSON-based format for storing and sharing marine data from different sources (e.g. nmea 0183, 2000, seatalk, etc)
Other
91 stars 68 forks source link

Vessel Identity #91

Closed timmathews closed 8 years ago

timmathews commented 8 years ago

Intro

The vessels object at the root of the Signal K object is a key-value store holding information for 1 or more boats. We need to determine exactly what this key should be and what requirements it must fulfill.

This issue is intended to discuss that. Please keep in mind that this is not the same as the identifier for other Signal K hosts on a network (e.g. on a boat with multiple Signal K sources). This is for identifying a single vessel which may be a collection of one or more Signal K devices.

Background

We were originally going to use MMSIs as the unique identifier, but that won't work for a variety of reasons, not the least of which is not everyone has one. MAC address of the SK server device was floated as an option, but that changes whenever the hardware does, so it may not be a good option either. In fact, a boat may have more than one SK gateway and we need to keep the ID the same across devices.

Requirements

These are what I believe the requirements to be. I would like to use this discussion to firstly agree on a set of requirements, vetted against as many use cases as possible, and then float options which satisfy these requirements so that we can agree on something.

Use Cases

tkurki commented 8 years ago

I believe that a single identification scheme won't cover all situations, unless it is something like a generated UUID, which sort of doesn't make sense - nice APIs, non-human-friendly IDs and too brittle.

Here we run into limitations of the JSON approach: there is no way to build a composite key (scheme + value) without just stuffing the values into a string key.

On the other hand if we think about this in terms of an API and not a single JSON document we could define different routes / paths such as

Each could offer a different view (in Signal K format) to the data the server has. The paths above illustrate just some of the ways to identify a Signal K entity.

Without the trailing id parameter you would get a collection of all the vessels' (items') data keyed by the ids in JSON safe format.

/signalk/v1/api/ could list all the different access schemes that the server supports.

As for delta there the context could be a composite id, not just a path from the root in the JSON tree. In effect we would namespace the identifier by an identifying scheme

{
  "context": {
     "scheme": "MMSI",
     "id": "230099999"
  },
  ...
}

Subscriptions would change in a similar fashion.

lsoltero commented 8 years ago

here are some issue than thoughts.

  1. the current UUID max length is 8 characters... this is too short.. it should be extenders to 36 or 48 or 128... or something big

lsoltero [3:27 PM]

  1. the UUID must be unique. otherwise there might be collisions then data is shared over the internet

lsoltero [3:28 PM]

  1. IMHO there should be a central service that hands out UUID and ranges of UUIDs. much like IP and MAC addresses.

lsoltero [3:29 PM]

  1. companies should be able to register with the signal-k UUID authority and request a vender ID. this ID becomes part of the UUID for this vendor. The vendor could then be responsible for generating UUID for his devices as long as the generated IDs contain the vendors VID.

lsoltero [3:30 PM]

  1. UUID need to be transportable. if a gateway breaks then the user should be able to purchase a gateway from a different vendor and move his UUID from the old to new device.

lsoltero [3:30 PM]

  1. Gateways should come preconfigured with a unique UUID (or should be able to generate one on the fly)

IMHO a central authority for generating vendor id's and UUIDs would be best. This way you can pretty much guaranty avoiding collisions.

the MMSI is not sufficient to provide unique ids... this is because vessels can have several MMSIs and in the case of portable devices they migrate with the device (I may be wrong about this).

rob42 commented 8 years ago

Have a look at http://stackoverflow.com/questions/9543715/generating-human-readable-usable-short-but-unique-ids

I quite like the base36 version as its easy to read back - try doing upper-lowercase by phonetics over a vhf!

Also quite unique - we dont need many letters - 6 chars will give you 36^6 unique IDs = 2,176,782,336 (2+ billion)

And an implementation here http://download.github.io/suid/

I think that a central site to ensure uniqeness is inevitable. But it can be like a DNS server, one root server, and many sub-servers that provide ids for their domains from pre-allocated blocks. We could provide a server implementation via github to ensure simplicity. That also provides a path to automating certs, but thats a much bigger job :-)

keesverruijt commented 8 years ago

Base36 sounds neat.

We could hand out 'first' digits to signalk.org and NMEA. K and N come to mind ;-)

sumps commented 8 years ago

This UUID issue has been discussed a few times, even within the short period I have been on Slack and I am sure that the original SK development team must be "sick to death" of this subject.

However a decision does need to be made though and I would like to make a proposed solution for first release.This proposal is based on the assumptions that....

1) The UUID is a unique identifier that allows data from the same device (not vessel) to be linked together over a period of time 2) There are other Vessel Identification fields stored in Signal K i.e. MMSI, Boatname, Call Sign, IMO Number, etc. 3) In a complex Signal K system (see image), multiple UUIDs may exist on the same vessel

signal k complex system

4) A device or consumers (App) UUID may change over time 5) Not all devices or consumers can create a UUID in the same way 6) Signal K services that allow data to be shared with the Cloud will have their own identification and authorisation methods just like Facebook, Twitter, LinkedIn, etc.

Proposal Use a minimalist UUID (128bit) that can be generated on all OS platforms (need to find a solution for embedded "bare metal" systems) and this plus the Vessel Identifcation Fields will identify the device on the vessel.

Make no attempt to synchronise the UUID between SK devices so that it becomes vessel specific and accept that the UUID is transient and may not last the lifetime of the device or consumer.

Time for comments and please use real life examples in any arguments for or against this proposal.

sumps commented 8 years ago

This is an article I read on this minimalist UUID generation on different OS...

http://graemehill.ca/minimalist-cross-platform-uuid-guid-generation-in-c++/

fabdrol commented 8 years ago

What if we let go of the idea that a user can transfer his/her ID to a next vessel, server or gateway - but instead tie the ID to the hardware. Crowdsourcing (e.g. MarineTraffic) accounts can be updated with the new ID easily and that way we ensure that conflicts are never created as a result of a user selling a gateway and re-using his old ID (with the new owner using the same ID).

In that case, the chosen ID only has to be:

rob42 commented 8 years ago

The UUID is vessels.UUID.* is the vessels unique identifier. Its intention is to identify the vessel, not the gateway device. The signalk model is intended to have many devices with a partial or full copy of the data at any time. For an individual device the source identifies the origin of the data.

The same vessel.UUID allows simple merging of trees of data from many sources, which will automatically map over the correct vessel. If you create many ids per vessel, then the merging of multiple data source become diabolically complex.

So IMHO we are overthinking it, and mixing in the source with the UUID. If we need to identify the individual gateways on a vessel we should add specific functionality for that. The UUID should be a unique identifier for the vessel, and the same on all gateways in a given boat. It doesnt actually matter what you use as a UUID so long as its unique. But I favour a boat name or boat-email, with uniqueness guaranteed by an optional distributed online registry - first in, first served. If you choose to use 'Motu', you cant - I already have it. This is how ship names work here, hence there are already 'Infinity' to 'Infinity VII' taken, which kind of ruins the infinity bit :-)

sumps commented 8 years ago

Let me start by saying that I have no strong opinion of UUID, other than I want to get it agreed so that I can finalise the gateway design.

When I first became aware of the Signal K UUID, it appeared that it was being used as a normal UUID to avoid data contention over the internet. So any unique method of generating a random but unique UUID would have been OK.

Then we discussed the UUID being an IPv6 type globally unique address that would allow any two boats to communicate with each other over the internet, but someone pointed out that without the device being connected to the internet, that the IPv6 address is just the Mac Address, so that idea sort of fizzled out.

Now we are talking about a centrally controlled registry of global Signal K UUIDs, which aside from the time and resource of setting this up and managing it, will almost certainly be open to abuse. Imagine MMSI numbers without the hardware control stopping people from easily changing them on their AIS or DSC VHF.

How do all of the gateways and servers on a boat (and even intelligent consumers) synchronise the vessel UUID. It is fine if you have one server, but as soon as you have more than one, you are back to worrying about a "master" and "slave" relationship again.

I think we need to focus on the reason for the UUID in the first place with some use cases of why we feel it is needed and some discussion around whether a particular type of UUID is needed to make these use cases possible.

sumps commented 8 years ago

@rob42 you have given me an idea for the name of my first boat "Infinity ∞" 😜

fabdrol commented 8 years ago

@rob42 I wasn't really thinking of that, my thinking was that re-generating that UUID whenever something changes would make an easy solution to the problem of ensuring uniqueness. Regarding multiple gateways/servers on one vessel: one should be the "master" server, which generates the vessel UUID. The rest simply feeds data to the master.

So, summarised:

If I were to sell my boat with server - or just my server - I reset the server to "factory" settings before selling. This process not only wipes my data, it also generates a new UUID for that server ensuring that the new owner doesn't use my UUID (which would create two identical UUIDs on the network). If I forget to do so, the new owner will do it when he logs into the config interface and changes the vessel name (Volare -> Infinity, for instance).

That said, I like the idea of an (optional) registry. You register your name with the registry, and your account is authenticated by an email address and password. Whenever you install a different (master) server it's UUID can be attached to the account online. This mechanism can then also be used to give that server a certificate that it can use to authenticate itself as that account even when not online.

That would give me:

fabdrol commented 8 years ago

Btw, regarding the master/slave thing: p2p would be a better solution going forward, so here's how I see that:

sumps commented 8 years ago

That sounds fine but I think we need to look at the mDNS/Bonjour Text Record issue again as we need to identify which device has the UUID and then we have the issue of installation i.e. The user must turn on the "master" device before any new device is added so that the new device can discover the master.

I am not against this approach, just need to make sure the process of installing an SK system does not become a difficult or technically confusing experience.

It is this type of system process that needs to be defined, so that all gateways, servers and consumers operate in the same way.

rob42 commented 8 years ago

But if the UUID keeps changing over time then the gateways cant merge records. This behaviour is all based on vessels *. eg how will I Know that vessel abd is now jkh without internet lookup?

The use of a UUID per vessel that doesnt change fixes that,

fabdrol commented 8 years ago

sure, but then how do you guarantee uniqueness? You'd get something like this (right?):

But that makes a central (online) registry a requirement, which could be an issue for (initially) non-connected boats

sumps commented 8 years ago

A few comments for consideration; 1) If we use Boatname as the UUID with this online registration service on a 1st come 1st serve basis, there will be loads of complaints. There is no global boat registration scheme and there might be hundreds of boats called "Infinity" around the world and you would not be happy if you had to have "Infinity 998" 2) If we use email then all sorts of privacy issues start to arise 3) Unless there is some sort of restriction in the SK devices that stops a user entering anything in the UUID, then the system is open to abuse

tkurki commented 8 years ago

If the vessel's identity is governed by generated identity like UUID then we should build some safeguards against the user just entering their boat's name or all zeroes or some other defunct value.

One safeguard against this would be a checksum.

For new equipment and installations the system would come either with a preconfigured uuid and checksum or would generate it during first powerup.

Transfer of identity would take place by exporting the uuid + checksum from

From the user's perspective this Signal K id would look like a single serial number. The format can be uuid in canonical format with the checksum appended. Uuid canonical format in hexadecimal format is 8-4-4-4-12, Signal K id could be for example 8-4-4-4-12-4.

The point with the checksum is not security, just inducement to use properly generated uuids.

Comments? Checksum algorithm?

tkurki commented 8 years ago

We are now using just a plain value with no structure to it as the identifier. There is nothing stopping us from encoding the scheme and value in it:

MMSI scheme is needed for identifying AIS targets.

The same notation can be used in delta context: { "context": "vessels[mmsi:230029XXX]" ... }

fabdrol commented 8 years ago

I like this approach. The namespacing allows flexibility in the sources of vessel data. signalk: and mmsi: are obvious ones, but I can imagine other schemes as well, such as a private vessel tracking system

keesverruijt commented 8 years ago

For non-connected boats the hardware should come with a self-allocating scheme, not alterable by the user. This can be either: (1) the manufacturer is pre-allocated a part of the signalk namespace and has his/her own mechanism for handing them out. (2) the manufacture registers a private namespace with signalk and comes up with their own format.

For example a hypothetical manufacture ACME Inc. They either register acme: as a prefix with SignalK.org and do their own stuff, or ask SignalK for a prefix. We allocate, for instance, 'ac00' to them. This makes them free to make ID's starting with signalk:ac00-.

This will work without any UI or configuration interface.

fabdrol commented 8 years ago

@canboat I'm not sure I am in favour of allowing arbitrary vendor prefixes (the -webkit-, -moz css hell comes to mind). Allocating a part of the UUID is an option, but I don't see how it benefits the user.

The main benefit of these prefixes are, IMHO, not simply "namespaces", but more like URI schemes (e.g. http://, redis:// - we could even opt to use this format: signalk://). The scheme indicates to the consumer or receiving server (a) what kind of ID he can expect and (b) what capabilities the device has. For instance, a vessel with the signalk: ID prefix indicates to another server that that entity in the vessel list doesn't just provide some data (like mmsi:<mmsi> vessels) but can actually be connected to directly.

A use case: I'm in a harbour connected to another vessels' Signal K stream. In the vessels list of the stream I find another signalk: vessel. A GUI then allows me to connect to that vessels' stream as well. Other vessels in the vessel list, for instance prefixed with mmsi provide no such functionality.

Other prefixes or schemes are reserved for other systems. For instance, a commercial cargo fleet operator uses satellite trackers on all their vessels of imaginary brand "STrack". They also operate a Signal K-based solution on newer vessels. All vessels see all other vessels in their vessels list, some of them prefixed with signalk: and others with strack:.

tkurki commented 8 years ago

@canboat

tkurki commented 8 years ago

See https://github.com/SignalK/specification/blob/mandatory-uuid/test/treeValidation.js#L23 for a concrete if a bit obscure of my proposal for ids.

Prefixed format is used as property name under vessels, but under the real vessel root there are uuid and mmsi explicitly. This is not very consistent: uuid property should probably be signalk.

A separate identity branch under vessel root might be useful for grouping these and others (call sign, sail id, national registration id, third party ids).

timmathews commented 8 years ago

@tkurki, what does the checksum add? The UUIDs are public, if someone wants to spoof it, they can. There's really no getting around this short of some kind of PGP-style asym crypto. I think that's another discussion entirely.

So to close this out, let's use RFC 4122 UUIDs as the only required ID and end the discussion on alternatives, namespaced IDs, different options, etc. Then the only discussion is about whether to use v4 random UUIDs with a "generate new ID" button in the UI or v5 UUIDs, with some sort of user-provided input to seed the UUID generation.

Going with autogenerated v4 means giving up on portable UUIDs, v5 means increased risk of collisions, but perhaps with some creativity, we can work around that.

tkurki commented 8 years ago

See my comment above

fabdrol commented 8 years ago

@timmathews if we leave out the namespacing, how would we differentiate between a vessel that has a Signal K server on-board or a vessel that's been added to the tree by the server, coming from an AIS device?

tkurki commented 8 years ago

To reframe Fabian's question: @timmathews did you mean that uuid is always mandatory and when streaming AIS data locally the vessel would be identified by a generated uuid? I don't think that is feasible.

sumps commented 8 years ago

Based on my experiences of spending years at C-Map providing chart unlock codes to end users, if you want this UUID to be manually entered, I would seriously suggest using hexadecimal characters (not case sensitive), in groups of four, separated by hyphens (that are entered automatically) and making the length of the UUID no more than 20 characters, with a 2 digit checksum that can check that the code has been typed in correctly. This will give you the best chance of consistent success with end users who struggle to cut and paste, can not remember more than four characters at a time and who are put on this earth to mess up the ordered and logical world of software engineers !

sumps commented 8 years ago

Are we getting close to a consensus on this UUID issue ?

tkurki commented 8 years ago

@sumps in #95 we are using real RFC 4122 Uuid format, which looks like de305d54-75b4-431b-adb2-eb6b9e546014 and a CRC-32 that looks like 74a93ec0. A bit more than you argued for, but I would definitely like to use standards and not something homegrown.

Entering your uuid should be a rare occasion, not something you do every day, so the slight awkwardness caused by the length of the values is justifiable. As for UI I would fashion it something like

id: [ ]-[ ]-[ ]-[ ] checksum: [ ]

I sincerely hope that consensus is near!

fabdrol commented 8 years ago

@sumps if we choose this course of action I'll look into building a small iOS and Android app that can make that process easier, perhaps using a QR code or some similar mechanism.

fabdrol commented 8 years ago

Btw, my .02 re checksum: I think it should work a bit like a key. Only the owner of a server knows the key that goes with his UUID, and the key is required to input the UUID somewhere else. That way, we prevent another vessel from re-using my UUID.

I believe my interpretation of the checksum is a bit different than @tkurki's original idea, please comment/feedback :)

tkurki commented 8 years ago

I'll quote Tim from above: "There's really no getting around this short of some kind of PGP-style asym crypto. I think that's another discussion entirely."

As far as I can tell it is either checksum level check or nothing at this stage.

fabdrol commented 8 years ago

Okay, scrap my comments - let's just go with a checksum to make it not too easy to spoof an UUID solely based on what you receive over the air.

sumps commented 8 years ago

I smell consensus in the air !!

tkurki commented 8 years ago

PR #98 now contains three methods to specify the primary identity of a vessel

See example json

sumps commented 8 years ago

Is the UUID mandatory for data that leaves the vessel ?

keesverruijt commented 8 years ago

Yes

On 18 Oct 2015, at 22:25, sumps notifications@github.com wrote:

Is the UUID mandatory for data that leaves the vessel ?

— Reply to this email directly or view it on GitHub https://github.com/SignalK/specification/issues/91#issuecomment-149044345.

Kind regards,

Kees Verruijt —

Sent from my Nokia 6310i

tkurki commented 8 years ago

I don't think there is a way to control which data leaves the vessel and which doesn't.

sumps commented 8 years ago

In that case we have to make it mandatory for the device responsible for Self to create the UUID and to include the MMSI if the Self vessel has one.

rob42 commented 8 years ago

This has stalled again and we need to finish it. At the core of this we have a couple of basic requirements:

There are lots of ways to make a unique id, so for v1 lets just specify that it be reliably unique (globally), and limit the chars so we dont create stupid horrors like using sentences for file names, or emoticons etc. BTW The checksum is useful for entry but if it can be generated, then there is no need to store it - eg generate it on a uuid export, and on reentry to test - its becomes an implementation detail - a nice to have for a slick product.

We can adopt more explicit or the newer evolving standards in later versions, as the ids will still be unique.

So we should adopt Teppos pull above minus 'checksum' (minus 'href' and if requred) and close the matter:

{
  "vessels": {
    "uuid:de305d54-75b4-431b-adb2-eb6b9e546014": {
      "uuid": {
        "value": "de305d54-75b4-431b-adb2-eb6b9e546014",
        "checksum": "74a93ec0"//not required
      }
    },
    "mmsi:230099999": {
      "mmsi": "230099999"
    },
    "href:vessels.signalk.org/someboatidentity": { 
      "href": "vessels.signalk.org/someboatidentity"
    }
  }
}
fabdrol commented 8 years ago

I thought we had sort-of agreed about the MRN thing? E.g. urn:mrn:signalk:uuid:<uuid>, urn:mrn:itu:mmsi:<mmsi>, etc.

rob42 commented 8 years ago

I approve Teppos PR with either the above or the urn:mrn:signalk:uuid:<uuid>, urn:mrn:itu:mmsi:<mmsi> variant. Its still unique, just a different prefix. @tkurki can you comment or adjust your pull? To get this moving if there are no objections it can be merged as soon as Teppo is ready?

tkurki commented 8 years ago

If checksum is not in schema it is not in the api and then we can forget it all together. I'd prefer keeping it, but it's not a showstopper either way.

I'll update pr in a few days.

fabdrol commented 8 years ago

@tkurki I agree, but let's wrap up the UUID thing first, then tackle the checksum at a later time (could just be an implementation detail, or a spec thing).

tkurki commented 8 years ago

Are we done here for now?

sumps commented 8 years ago

If there is any chance that we will want users to type in a UUID then we have to include a checksum and I suggest using the same format as NMEA0183 as it is well documented and lots of developers have functions to generate and check them.

MatsA commented 8 years ago

Hi ! I have recently become interested in SignalK project and don't have any deeper knowledge about the data model but can't find the ships Country code, ISO, anywhere ?

keesverruijt commented 8 years ago

Hi Mats, this is the wrong place to ask a question not related to this is particular issue. Join our Slack discussion group at http://slack-invite.signalk.org/ or Google Groups https://groups.google.com/forum/#!forum/signalk

rob42 commented 8 years ago

But a good point none the less. I will open an issue for it