belgif / thematic

ICEG: Thematic Working Groups
11 stars 5 forks source link

REST guide vs URI standard #31

Open pvdbosch opened 4 years ago

pvdbosch commented 4 years ago

When comparing to the REST guide on https://www.gcloud.belgium.be/rest (sources on https://github.com/belgif/rest-guide), I find two possible conflicts with the ICEG URI standard:

1) Identifier of a resource: URIs are often too long to use in practice, e.g. for displaying or saving to a database. That's why a shorter identifier is often used, e.g. ssin, cbeNumber, ...

For REST APIs, it seems a bit overkill to always refer to the full identifier URI (specified in section on REST in URI standard.

In gcloud/belgif REST guide, we only link to the REST URI of a resource and specify the short identifier, e.g.

GET http://rest-reference.test.paas.socialsecurity.be/REST/demo/v1/companies/1

{
   "self": "/companies/1"
   "owner": {
      "ssin": "12345678901",
      "href": "http://example.org/v1/people/12345678901"
   },
   "website": "https://wwww.mycompany.com"
}

(Example taken from the section Links of the REST guide)

2) Retrieving a representation of a resource with "/doc/{concept}/{reference}": this is also the purpose of a REST API. When to use which solution?

bertvannuffelen commented 4 years ago

@pvdbosch on the technical argument of length. For the objectives of the standard it is actually irrelevant. The persistent identifiers are means to integrate data across systems. Technical arguments connected to implementation should only be considered if they lead to impossible or not implementable at all.

Databases or data exchange methods should not have such limits. If they exists is that because some "optimisation" aspects were taken into account during the architecture of the solution. E.g. In a database, there can be 2 keys for an entity, the public persistent identifier, which is a URI, and a local database identifier (a long integer) (e.g. an index on top of it). Designs where the local database identifier gets bound to the public identifier with a resolution context (the namespace part of the URI), essentially binds the usage of that identifier to the system. And that is not what you want. All designs of datasources do somehow an attempt to separate the public identifier from the internal local identifiers. This strategy goes to the far end: total decoupling, where the only bound to a sofware system is a web accessible domain. All the rest is decoupled.

To the usage in actual dataexchanges: yes the full URI should be present as the master identifier. If we really want to facilitate machine wise data coupling from one system with data from another system than it is required. Identifier construction based on external knowledge expressed in a manual next to the system does not work for machines. Also this idea is part of the standard.

Technically speaking, this standard is a simplification for digital systems: it allows you to handle identifiers for bankaccount numbers, ssin, addresses, parcels, roads, pumps, boeks, ... all in the same way. No discussions on length,dots,patterns, ... designed specifically to work in one domain, but for all things. It is like barcodes, but with the reference database the Web. Any system can handle it, and has the potential to do something with it.

Aside from this, it does not excludes technical solutions to compact the data exchange in an unambiguous way. For instance json-ld is a W3C standard that unambiguously transforms a json structure to RDF structure: this is adding the namespaces to the identifiers. Such approaches may bridge the human manipulation and the machine manipulation of data. And it does not exclude to add the reference as an extra attribute in the data exchange. But that is not the identifier.

pvdbosch commented 4 years ago

Hi @bertvannuffelen,

Thanks for your explanation.

Database should indeed not be a problem for using full URIs as identifiers. Though some other examples, e.g. in case of a ssin:

Compacting with a technical solution for transformation would then indeed prove a good solution.

In gCloud functional WG, we discussed JSON-LD. For some use cases, like fully public data, it can prove interesting but for privacy-sensitive APIs, it often seems too costly to support. IMO, there should be a way to augment an OpenAPI document with RDF metadata so it wouldn't require any additional development on the application providing the API.

bertvannuffelen commented 4 years ago

@pvdbosch

  • an ssin is often displayed on screen, inputted by hand by a user, or printed out.

no problem here. The shorter form is then used as a key attribute, but not anymore as the "glue" between data.

On inputting by hand, it would be a dream if Only-once principle could be implemented for forms. If I authenticate with my Belgian ID, I can request to have the data being prefilled with the data that the Belgian gov knows and what is required for this form. So entering identifiers might be automated ;-)

This is the real outstanding challenge. There is an implicit assumption in API design that all identifiers expose via the API are managed by that system or that they are maintained by one system. The first assumption leads to copying data to the system serving the data. The second leads to creating additional systems to support that API solution aggregating and harmonizing the data for the "easy use of the API". E.g. BESTADD is a decentralized identifier system in which an identifier for a Belgian address now can follow 3 different schemes. So the local identifier 3214343431 may be present in 2 registers but obviously represent different addresses. From the perspective of the data, it is all fine, but indeed from the perspective of a REST API that would return all landowners on a given address, the parameter address will be a full URI. Otherwise it is non-ambigeous. 1

Unfortunately this might be leading to larger, and more lengthier REST calls which are maybe less penetrable on the first sight but yielding a more stable interaction scheme. For me the case of URIs in parameters is unavoidable, and unfortunately no standard exists for semantical resolvement of URL parameters as for the data with json-ld exists. But with some good agreements i think we can design a semantical resolvement for the parameters.

  • supporting both full URI and a short form would make APIs more complex

Can you express what is complex here on the publisher side? And what is complex on the consumer side? Where are the gains for you?