Decide on linking - Githubissues

adewg / ICAR

Standard messages and specifications for animal data exchange in livestock.

https://icar.org/

Apache License 2.0

49 stars 27 forks source link

Decide on linking #19

Closed alamers closed 4 years ago

alamers commented 5 years ago

There is a need for automated discovery of urls. Parties that connect to multiple servers should not rely on client side synthesizing urls. Instead, the server should provide links in the messages to related resources.

The workgroup should choose such a standard (e.g. JSON-ld or hateoas or something else).

As long as we don’t have committed to this, we need temporary workarounds like example url schemes to document the standard.

cookeac commented 5 years ago

I did a quick investigation of REST hyperlinking schemes. For self-links the many variants of HATEOAS really come down to two: a. JSON-LD (what DataLinker uses), which uses "@ID" : "https:/....." or b. Other HATEOAS methods which use "_link" : { "rel": "self", "href": "https://...." } Both are pretty easy to implement. The issue is that choosing one or other would seem to commit us to supporting that linking method going forward for other, more complex links.

The third option is to use neither, but "roll our own" - for instance "self-url" : "https://...". Simpler but way less standard.

cookeac commented 5 years ago

I'm going to add comments from email conversations to this thread to capture them here for reference by other contributors.

cookeac commented 5 years ago

On 25 June 2019, Arjan wrote:

Was indeed good to meet everyone in ‘real life’ :)

Wrt your 2nd and 3rd points; I’ll leave that to the experts.

Wrt to the hyperlinking: just a few thoughts here:

I noticed some alternatives but they don’t seem too popular (json:api, ion, siren, collection+json) so I think the shortlist is reasonably complete;
JSON-Hyperschema looks like it is on its way of being approved by IETF while the HAL draft seems to be stalled a few years ago (if my google skills are not failing that is);
JSON-LD is a w3c community standard, which may be a bit less ‘formal’ of a standard than hyperschema (although i would not consider that as a pro or con in any way);
I agree that there was some confusion on how JSON-LD appeared to embed the other objects, but maybe that is more of a visualization/documentation problem (and should thus be fixed on that level) so it is for me not necessarily a disqualifier for JSON-LD.

What are your thoughts on compulsary vs recommended in this area? The way I see it, the JSON-Schema’s are the core / compulsary part of the standard (with possibility of extension). The URL scheme will be a recommendation. Do we make the hyperlinking a recommendation / optional part of the schema’s?

So, no strong opinion on my side other than a slight preference for either JSON-LD or Hyperschema.

cookeac commented 5 years ago

On 27 June 2019, Andrew wrote:

Thanks for the feedback Arjan,

Regarding the hyperlinking: compulsory vs optional. I see that specifying linking URLs is part of the data returned from the API, so it should be specified in the schema (ie, part of the standard), but obviously the URL fields themselves should be optional, as with many other data members. The reason for specifying this as part of the standard is so that if it is used clients will know how to interpret it. There are very few standard link relations that are relevant to our domain (unlike RSS feeds for example), so I think we need to specify the small set that we need.

I also prefer either JSON-LD or JSON Hyperschema over HAL. We had gone with JSON-LD in DataLinker because:

a. The involvement of Google, Microsoft, and others in JSON-LD gave us some confidence in the level of community support; b. Object references (URLs) could be named components of the data schema, rather than having to be discovered in an array of link description objects (this could still be done in Hyperschema by placing links in appropriate objects); and c. There was already a body of JSON-LD objects at schema.org which we could reference rather than re-define (for instance, Person and Organization).

However, we can make our own decision here on what to use.

It may be that we can define schemas in such as way that there is no great difference between JSON-Hyperschema and JSON-LD. For instance let’s imagine that an animal had a “sire” property which contained the ID of the sire and a link to the sire’s animal resource.

JSON-LD: “sire” : { “id”: {“scheme”: “org.icar.official”, “xxxxxx” }, “@id” : “https://....”, “@context” : “<schema url>”, “@type”: “icarAnimalCore” }

JSON-Hypermedia “sire” : { “id”: {“scheme”: “org.icar.official”, “xxxxxx” }, “links”: [{“href” : “https://....”, “rel” : “self”, “targetSchema”: “icarAnimalCore” }] } (Assuming rel:self is ok because the link is inside the reference to the sire)

cookeac commented 5 years ago

On 27 June 2019, Craig Vigors wrote:

I wasn’t at ICAR, so I may be a little out of the loop. My thoughts on the points below.

1: I have no real preference on which standard is used, as long as a standard is used. I wouldn’t have worked with Hypermedia or LD in the past so I wouldn’t be making an informed decision. We should have hyperlinking included though using some standard. I have always had a slight curiosity with how far you go with the hyperlinking, when it comes to API performance. For example, with you have an API that provides a list of animals currently in the herd, would you provide a link to each animals’ insemination / calvings / movements etc. If the animal doesn’t have an insemination, or access to that data is restricted for that client, does the link get provided anyway? In order to prevent the link being included, you would need to check if there is data available. That comes at a cost of performance.

2: “Arrival/Departure: Origin, Destination, transporter/Haulier, transport reference number, vehicle registration, date/time loaded, date/time unloaded, farm assurance reference” This seems little over the top to me, and we wouldn’t be collecting it. What is it used for? GDPR would potentially impact a lot of those fields for us.

“Death: Reason, disposal method, disposal reference/receipt number” Sometimes reason for death has come up as an interest to certain parties, but not everyone. I wonder how often it is known and to what accuracy? “Registration: All the animal fields that are necessary for first registration”. I don’t understand here, are you saying that this data isn’t included? Is there another service that provides it?

3: The array of parents would scale well, and there would be a preference for 3 generations for pedigree animals.

cookeac commented 5 years ago

On 27 June 2019, Andrew wrote:

Really good thoughts and questions, thank you.

Hypermedia
- As with any other optional member, you certainly would not have to return any URL that you didn’t have or was not relevant.
- Good question about whether you should return URLs for which there is no data – I would usually encourage our developers to provide the URL, which when called might return an empty result set. This has the advantage that you don’t need to check whether there is data before populating the URL links for each [animal or other entity] in the array.
- Data access is a different challenge. You might use the authorisation to avoid emitting links that they cannot access, or you might simply return a 401 error (or a request to authenticate with different credentials) if those URLs are accessed.
Additional data in events
- The intention is that this is a standard for Animal Data Exchange, not necessarily “Animal Data for Genetic Improvement Exchange”. This means people could use the same schema for animal recording for health analysis, benchmarking, farm assurance, or traceability. It is the farm assurance and traceability people who would use most of those extra fields. Health recording might be interested in some of the fields.
- Same approach as other optional fields – if a provider does not have these, they will not appear in the emitted JSON data.
- In regards to GDPR, most of that information (origin, destination for instance) will be government identifiers which are regulated, not personal data. Transporter details if a person might be relevant for GDPR, but if just a company name, then also not covered by GDPR. However, the standard only provides the fields that can be filled in. It doesn’t determine the policy of the organisation making the data available, which might elect to not expose that data through an API.
Death Reasons – good question. We’ve talked about Reason just being a text field (seems to be widely used), but it would be interesting to get feedback on others about what is recorded.
Registration – All events refer to the animal by ID. When recording a registration for an animal, you need to provide many of the other animal fields (although the list varies by organisation of course). My intent is that a Registration event would embed an icarAnimalCore object so those fields can be provided. It is the one case where embedding the animal object into the event makes sense.
Thanks for the feedback on the array of parentage, that’s helpful.

cookeac commented 5 years ago

At the meeting on 28 June 2019 it was felt that either JSON Hyper Schema or JSON-LD were reasonable, and I undertook to investigate further JSON Hyper Schema.

An issue with JSON Hyper Schema is that it is designed principally to be declarative at the time of schema/API definition, and doesn't completely lend itself to cases where the implementation and format of URLs is not known or there may be a number of implementations. This is because links are defined as URI Templates in terms of RFC 6570. The client is responsible for resolving the URL template to a URL as follows:

If relative URLs are specified, interpret these as relative to a URI Template specified with the "base" keyword, or relative to the URL of the instance document (the resource you are accessing). If "base" is specified, it is a URI Template itself and also gets resolved relative to the instance document if necessary.
Substituting any variables from the JSON, such as "id" of objects. The full spec is here: https://json-schema.org/latest/json-schema-hypermedia.html

This is great, but it does lend itself to URIs being specified in only one way, using one protocol, which is not what we are trying to achieve. It is hard to override this for specific implementations, as the URI Template is specified in the schema. However, I believe JSON Hyper Schema could be used if care is taken in how we specify the URI Templates.

For instance: "Links": [" { "rel": "self", "href": "{@id}" } ] Would specify that there was a property in the object called @id, which contains a URI to the object itself (useful if you want to GET or PATCH a single object). I've used @id here as an example that some of you might recognise from JSON-LD. In contrast, if we decided that URI paths were to be the same for all possible implementations, we could use: "Links": [" { "rel": "self", "href": "animals/{id}" } ] Which would say that there was a property called "id", and all animals could be found at the relative path starting with "animals/".

As many of you know, I prefer the former approach (not specifying the exact URL paths in the schema which applies to every implementation).

cookeac commented 5 years ago

In terms of link relations (the "rel" part of a link), the following from the IANA registry are likely to be useful to us: "self" Specify a link to an object "collection" URL to the array of these objects (so I can get many of them, or POST a new one) "edit" Assuming you use PUT/PATCH to edit, this allows specifying a schema for edits

Then inside collections: "first" Return the first page of a paginated collection "next" The next page of a paginated collection "prev" The previous page of a paginated collection "last" The last page of a paginated collection

We may need to define our own link relations to describe links to:

An animal in a pedigree
An animal referenced in an event
The collection of events (of a type) for an animal
Supporting data such as a movement Consignment shared across multiple animals. These could be defined as "icar.org" link types.

cookeac commented 4 years ago

I have defined:

icarResource.json, which becomes a base class for resources, and defines "@id" (self link) and meta (meta data).
icarResourceReference.json which provides a TYPE that can be embedded to provide a link to a resource, with JSON-LD style "@context", "@id", "@type"="Link" properties. You could use this to provide a link to an animal or a consignment, straw, or device.
icarResourceCollection.json which becomes a base class for a collection of resources, supporting pagination.
icarResourceCollectionReference.json which provides a TYPE that can be embedded to provide a link to a collection, with information about pagination support. You could use this to provide a link to a collection of animals, events, devices, etc. This was first submitted as Pull Request #48 but will need some work to make the JSON Hyper Schema work with speccy validation, and to define collection types for each resource type.

cookeac commented 4 years ago

The collection changes referenced above have been addressed by #73.

We implemented Link Description Objects ("links" array) in icarResourceReference and icarResourceCollection, but because we have multiple files included with $ref, and hence can't include the "schema" keyword, these are not recognised and fail validation.

We have removed the "links" array as the links themselves are still clearly documented in the schema without this. See commit #74.