distributed-text-services / specifications

Specifications for the DTS API
https://w3id.org/dts
28 stars 10 forks source link

Request: make available formats explicit #225

Closed geoffroy-noel-ddh closed 8 months ago

geoffroy-noel-ddh commented 2 years ago

Hi,

Tell me if I'm wrong but I don't see where in the specification an implementation can expose the formats it supports. Is that some implicit knowledge an API client is currently assumed to have about the services it calls (i.e. a text viewer must somehow know that a specific DTS implementation offers HTML or plain text as alternative formats)?

If that feature is ever considered, for maximum granularity/flexibility, it might be preferable to declare acceptable formats at the document level (e.g. in the members returned by the Collection response).

awagner-mainz commented 2 years ago

A defensive strategy could rely on the idea that a "REST API should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API)." (https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven) But I agree that this would still leave plaintext, xml, html, ebook formats, pdf, or even image formats for the corresponding scan images as potential representations. I see two (not mutually exclusive) standard approaches:

  1. Content negotiation. This relies on the client specifying what formats it would prefer, and the server responding with the best format it can provide.

  2. HTTP Link headers. A resource URI (document endpoint) would list all available formats in the form of multiple HTTP Link headers, wherein each link's URI would contain a format parameter for a media type, and its type would specify the mimetype available behing this URI.

Both methods work without adding fields and information to the response body. IMHO, where standards deal with things on a higher level of the API stack, they should be solved there (i.e. HTTP or HATEOAS/RESTful behaviour, maybe Hydra also has something to say about it?). However, I am not sure if this means that the spec should not mention/discuss this at all.

PonteIneptique commented 2 years ago

Hi, I think we already discussed this internally, but I won't speak for everyone here.

I believe Content negotiation is the go-to for this, but HTTP Link headers could be a neat way to supplement it.

geoffroy-noel-ddh commented 2 years ago

Relying on underlying protocol layers if possible would be nice. Although requested format is already part of the DTS layer. One disadvantage of the content negotiation approach (if I understand correctly!) is that the client would have to know what is the range of possible formats it can ask for in general, so that may excludes discovery of new or custom formats. Also it doesn't allow the client to filter a collection by formats.

Example use case: a web-based text viewer which can only show HTML texts from a list. With the two options suggested above I think the viewer would have to probe each document in a collection individually in order to show to the user a shortlist. If the collection is long, the process will be slow and won't scale well (e.g. EDH has a collection with 80,000 docs). Alternatively showing the complete list of documents in the collection and letting the user probe them one by one lead to a frustrating experience.

PonteIneptique commented 9 months ago

During the RC Workshop in Durham, it was decided that an supportedMediaTypes property would be part of the Resource object in the Navigation and Collection endpoints.

Implementation in the specs will be visible in https://github.com/distributed-text-services/specifications/pull/238

monotasker commented 8 months ago

The property mediaTypes (not supportedMediaTypes) has now been added to the root return object from the Collection and Navigation endpoints in release 1-alpha1

monotasker commented 8 months ago

We published the resolution of this issue during the tech committee meeting on 2024-03-08 commit https://github.com/distributed-text-services/specifications/commit/a0db8ca0f5b9ef2e208baae7cacd8e2b6108685b release https://github.com/distributed-text-services/specifications/releases/tag/1-alpha1

This is an alpha release and we are looking for feedback!