distributed-text-services / specifications

Specifications for the DTS API
https://w3id.org/dts
28 stars 9 forks source link

What sort of APIs do we want to have? #57

Closed hcayless closed 6 years ago

hcayless commented 7 years ago

During the discussion around #56, we realized we have a fundamental disagreement as to API philosophy between two camps (not listed in order of preference):

  1. An API should always tell its client where it can go next from any API response, using link relations
  2. An API client should know what sort of API it is accessing, and so can infer where to go next based on the metadata provided in an API response

The resolution of this question is necessary in order to determine the form of the solution for #56, #54, and #39. If we choose (1), then the properties in question would be in the form of links (location, references, etc.). If (2) they would be in the form of boolean properties (referenceable, etc.).

balmas commented 7 years ago

My two cents: I'm not convinced it's a disagreement on what an API should do. I think what we've been defining up until now is the API functionality that is specific to a distributed texts collections service. Adding in HATEOS style links for navigation definitely needs to be done, the question is whether we want to bake that into the model or not. My personal vote would be to see this done via the implementation -- Swagger/OpenAPI 3.0 allows a way to do it with the links feature, I think we should try to take advantage of that and see if it meets our needs.

hcayless commented 7 years ago

I'm confused though. I can see how this might work for plugging into the reference API: the API defines a reference lookup function, into which you plug an identifier, which you'd get from the collection metadata, and the collection metadata has a property that tells you it's "referenceable", so you know you can plug the id into your ref lookup function, yes?

I don't see how this can work for file download locations, which could be anywhere—they might be an API function, but could equally well be any URI.

Is the argument really over whether the API should have a fixed, known set of URI schemes as opposed to the API telling you where you can go next?

PonteIneptique commented 7 years ago

To me this is where the argument lies. I want "simple" properties that tells you you can use other routes (ie if referencable==true, then you can go to /dts/v1.0/references/{id}.

I would usually model this as a collection of references. If you want the client to specify an ID, I would do that using a URI template - the client can provide the id, but should not be required to know the URI or parse it.

balmas commented 7 years ago

Maybe we can get closer to a solution if we can confirm what we do agree upon.

Can we all agree on the following statements:

  1. Members of a DTS Collection may 1a. be readable at the DTS Passages API 1b. be readable at the DTS References API 1c. have one or more related resources of various mime types which can be directly retrieved at a URL

  2. If of 1a-1c is true for a member of a collection, the DTS Collections API should state it explicitly

  3. For 1c, It's not enough for the API to say there is a related resource at URL X, it needs to say what the format (mimetype) of that resource is and what it's relationship to the collection member is.

  4. The location of the DTS Passages API and DTS Refrences endpoints may be the same for all members of a collection but may also be different for different members

  5. Unnecessarily repeating a potentially length URL string across many members of a DTS collection (e.g. such as http://mycollectionservice.org/references) is not desirable, particularly for responses which may contain thousands of items

jonathanrobie commented 7 years ago

I think we should take one of three "purish" positions so that our design follows a well-known approach. Let me list them:

  1. Pure Swagger HTTP API - the URI structure is meaningful and drives the design. This is a tightly coupled approach, all servers must implement the same endpoints, and the server and client cannot evolve independently. The client API is defined by the documentation, and message responses do not tell you what you can do from a given state.
  2. Pure REST approach - the service has one entry point, which lists all available choices using link relations. A client needs to know the link relations and how to parse a message to find links and link relations. A client never needs to parse a URI. When URI parameters are provided, they are provided via URI templates. Versioning is done by changing the payloads, so a well behaved client continues to work as the service evolves.
  3. Layered approach - a pure REST API is built on top of a pure swagger API. A client can actually be written using either approach. Because some clients will take the pure Swagger approach, the server is tightly coupled to such clients, but clients that are RESTful continue to work as the service evolves.

I strongly prefer a #2 API. I would not be able to support a #1 API. Swagger 3.0 supports #3. Can it also document a #3 API as though it were a #2 API?

Worse than any of these is an ad-hoc approach, where our API has its own quirky conventions.

jeffreycwitt commented 7 years ago

I prefer number 2 as well. But think 3 is possible, but always derivative of number 2.

hcayless commented 7 years ago

I think I agree with all of Bridget's points, except I'm uneasy about 5. Do we know providing links will add massive overhead?

PonteIneptique commented 7 years ago

Notice : I am teaching in Lyon today and it was not that much expected. I won't be able to join the meeting today.

jonathanrobie commented 7 years ago

I also agree with all of Bridget's points except #5. I doubt that providing links in responses will result in massive overhead. If that is a concern, and we use json-ld, we can use @context to shorten the links, e.g.

  "@context": {
    "ical": "http://www.w3.org/2002/12/cal/ical#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "ical:dtstart": {
      "@type": "xsd:dateTime"
    }

But each link is surrounded by metadata and other data that probably adds up to more bytes than the link itself, so I'm not sure this is a concern. If you have thousands of results, you are probably paging or querying results, or both.

PonteIneptique commented 7 years ago

So, the famous, awaited, critic of where some of us want to go.

1. Coherence

I think we can all agree here. Whatever we decide (use full links, use canonical API routes [all the same wherever your implementation lies], etc.) for this should be applied to both other routes ( https://github.com/distributed-text-services/references-api , https://github.com/distributed-text-services/passage-api ). In this situation, the realities of both those routes should be taken into account.

2. Providing links : an overhead

Do we know providing links will add massive overhead?

If you have thousands of results, you are probably paging or querying results, or both.

I think this has been for a long time my argument : providing full link is an awful overhead. There is use cases where you want to retrieve all references from a text. Paging would actually create more overhead because you would add up HTTP queries on top of data weight (With no paging : 12000 references + 1 http request, with paging : 12100 references + 12 http requests*).

I wrote a small but speaking-for-itself example : I queried the Iliad of Homer for its lines numbers. A good example of why would I like to do that is simply computing how much of the lines should I group together to get something nice to read. This is currently what we do with Nemo (Get all resources at the lowest level, compute how we should group these references).

https://gist.github.com/PonteIneptique/959bb902299ccdb9090221b3982327b4

Resource Size Comparison (base No link 132kb)
CTS GetValidReff 964.3 kB 730.30 %
Full Link Potential DTS 1,716.756 kB 1300 %
No Link Potential DTS 132.9 kB 100 %
No Link Prefix DTS 237.kb 179.54 %

Both JSON output were minified.

Benchmark of parsing to get the full URI (Comparison between prefixed and no URN). The regexp is a necessary way to deal with prefix, because it would be unknown before reaching the client and could quite easily change from one query to the other depending on implementation.

check_time_for_dts_link_ _jsperf_-_2017-04-19_13 20 32

98% slower is the important score.

3. Similar routes

What really got me into CTS is the fact the routes are forced. Not giving links forces also to have forced URIs. What's great about it ? Wherever I go, CTS should work the same way, I can try to query directly by knowing an identifier ( say urn:cts:latinLit:phi1294.phi002.perseus-lat2) if it's on Perseus or Perseids, because I know I have to do GetValidReff or GetPassage. Enforcing this kind of structure makes sure that it's easily to swap one endpoint by another without expecting much differences, without having to reparse and go back from /collections route.

I hope this proved why full link is MUCH heavier.

hcayless commented 7 years ago

Some thoughts:

this isn't an issue of the collections api though, is it? This looks like what you'd get if you called whatever will replace GetValidReff. Is there the same level of problem if we give a URI for each member of a collection?

Your example is comparing the overhead of applying a regular expression to a string to doing string concatenation. And, yeah, it's more expensive. I'd note that it still may be acceptably fast.

Even so, could the URIs in your reference list not be relative instead of absolute? If the API request URI is http://ctsstage.dh.uni-leipzig.de/api/dts/texts/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/references, why not have passage/1.1 etc. in the reference list?

I do wonder if there's a sensible way to deliver a passage URI pattern that would tell the client not just what the big list of possible references is list getValidReff does, but what passage query URIs look like.

PonteIneptique commented 7 years ago

this isn't an issue of the collections api though, is it? This looks like what you'd get if you called whatever will replace GetValidReff. Is there the same level of problem if we give a URI for each member of a collection?

For this point, I would like to refere to my point 1 : Coherence. If we do move to links, we move to links everywhere. That does not make sense if we do not.

Your example is comparing the overhead of applying a regular expression to a string to doing string concatenation. And, yeah, it's more expensive. I'd note that it still may be acceptably fast.

Of course, but how else are you going to do prefix replace ? This has to be taken into account. 300 times slower is not an acceptable difference to me though, even if it "still may be acceptably fast"

Even so, could the URIs in your reference list not be relative instead of absolute? If the API request URI is http://ctsstage.dh.uni-leipzig.de/api/dts/texts/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/references, why not have passage/1.1 etc. in the reference list?

If you have only "/passage/1.1", it's technically saying as much as "1.1", except it takes more space, because this is not per se a URI, and as such, you cannot use it "as a Web page" as Jonathan said. Because there is nothing, as a string, that differentiates "/passage/1.1" and "1.1", and as such, it still forces the person to check the documentation. Then why should they have to take the burden of more bytes / reff ?

I do wonder if there's a sensible way to deliver a passage URI pattern that would tell the client not just what the big list of possible references is list getValidReff does, but what passage query URIs look like.

I actually thought also about that : what about if I want to know more about what other reffs there is in 1.1 ? Do I need to put two links ? :) And double the size of the answer ? If we do not use links, why not simply enforce a route scheme that would make everything MUCH simpler and lighter for end users to program with ?

jonathanrobie commented 7 years ago

I don't think anyone here would design the API the way that you did in your example.

The API is distinct from the representation of query results. The API gives you the basic functionality associated with a resource, and will not have thousands of links for a resource.

For your example, I would have one link, with a corresponding link relation, to allow me to query a collection or a document. I would also have a link, with a corresponding link relation, to allow me to retrieve an item from the result set. I would be inclined to let a collection define its own identifiers, and not to require a full urn for that. For general queries, the result set will often note only that it is a result set and probably provide information about the query used to generate it. There is no general way to create appropriate link relations for any possible query, so I would not try to do that - whoever retrieves the result set should have some idea what their query means.

You could, of course, provide URNs or URLs for each query result. The resulting overhead is an issue - as you have demonstrated.

Jonathan

On Thu, Apr 20, 2017 at 9:42 AM, Thibault Clérice notifications@github.com wrote:

this isn't an issue of the collections api though, is it? This looks like what you'd get if you called whatever will replace GetValidReff. Is there the same level of problem if we give a URI for each member of a collection?

For this point, I would like to refere to my point 1 : Coherence. If we do move to links, we move to links everywhere. That does not make sense if we do not.

Your example is comparing the overhead of applying a regular expression to a string to doing string concatenation. And, yeah, it's more expensive. I'd note that it still may be acceptably fast.

Of course, but how else are you going to do prefix replace ? This has to be taken into account. 300 times slower is not an acceptable difference to me though, even if it "still may be acceptably fast"

Even so, could the URIs in your reference list not be relative instead of absolute? If the API request URI is http://ctsstage.dh.uni- leipzig.de/api/dts/texts/urn:cts:greekLit:tlg0012.tlg001. perseus-grc2/references, why not have passage/1.1 etc. in the reference list?

If you have only "/passage/1.1", it's technically saying as much as "1.1", except it takes more space, because this is not per se a URI, and as such, you cannot use it "as a Web page" as Jonathan said. Because there is nothing, as a string, that differentiates "/passage/1.1" and "1.1", and as such, it still forces the person to check the documentation. Then why should they have to take the burden of more bytes / reff ?

I do wonder if there's a sensible way to deliver a passage URI pattern that would tell the client not just what the big list of possible references is list getValidReff does, but what passage query URIs look like.

I actually thought also about that : what about if I want to know more about what other reffs there is in 1.1 ? Do I need to put two links ? :) And double the size of the answer ? If we do not use links, why not simply enforce a route scheme that would make everything MUCH simpler and lighter for end users to program with ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/distributed-text-services/collection-api/issues/57#issuecomment-295742816, or mute the thread https://github.com/notifications/unsubscribe-auth/ABr5vWcQs86z44sf47RXItbP3nXyAHL5ks5rx2DTgaJpZM4MtPHH .

PonteIneptique commented 7 years ago

Could you propose an example of what you're speaking of ? Because otherwise it's not much clearing the fog out...

jonathanrobie commented 7 years ago

I think we now know what Bridget meant in her point #5. I think that's what Thibault was talking about in his example.

I want to address the another issue Thibault raised:

What really got me into CTS is the fact the routes are forced. Not giving

links forces also to have forced URIs. What's great about it ? Wherever I go, CTS should work the same way, I can try to query directly by knowing an identifier ( say urn:cts:latinLit:phi1294.phi002.perseus-lat2) if it's on Perseus or Perseids, because I know I have to do GetValidReff or GetPassage. Enforcing this kind of structure makes sure that it's easily to swap one endpoint by another without expecting much differences, without having to reparse and go back from /collectionsroute.

If you have an identifier for something, it should be brain-dead easy to retrieve it without navigating. That's important. It is also orthogonal to the choice of REST versus navigational HTTP API. There must be a simple way to look up something by identifier.

And URNs do not force me to put anything in a particular location, only URLs do. So the ability to look something up by URN is also orthogonal to this issue.

URLs are really the issue. There are good reasons not to prescribe URLs for applications:

In addition, it's really useful to be able to get a resource, look at the link relations, and know exactly what you can do with that resource. In HTTP APIs, I have to go read some documentation - and hope that the documentation matches the version of the API I am using at the moment. I don't think that is a good long-term architecture.

If I write a client that depends on link relations, it continues to work as the API is versioned, no matter which URLs are used at any given time.

Jonathan

jonathanrobie commented 7 years ago

Help me know what the user requirements are for this part of the interface

  1. Ask what references are available for a collection
  2. Ask for all instances of these references in a query (would a user really do that?)
  3. Dereference one of them

Is that about right? Also, is it better to support intensional descriptions of a reference system, or arbitrary descriptions that can be either intensional or extensional, instead of listing all possible references as the only possible option?

Jonathan

On Thu, Apr 20, 2017 at 10:58 AM, Thibault Clérice <notifications@github.com

wrote:

Could you propose an example of what you're speaking of ? Because otherwise it's not much clearing the fog out...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/distributed-text-services/collection-api/issues/57#issuecomment-295767599, or mute the thread https://github.com/notifications/unsubscribe-auth/ABr5vQZmWhoUzUZFNaW4EdIswzoVRTnpks5rx3J5gaJpZM4MtPHH .

PonteIneptique commented 7 years ago

My use cases are :

  1. Ask what references are available for a collection
  2. Ask for all instances of these references in a query (would a user really do that?)

    Yes they would

That's all that is needed. A list of references

PonteIneptique commented 7 years ago

@hcayless I rework your edit to make it run : https://jsperf.com/check-time-for-dts-link/9

check_time_for_dts_link_ _jsperf_-_2017-04-24_16 03 41

It still a 1/8 ratio. As a probable future user of the API, this seems to me not acceptable.

I would really like that we take the path of an API that is light to produce and to use. I am looking forward for another proposal of @jonathanrobie

For the routes, we have to remind that if we had solid, shared routes, the reuse and navigation from one of our API to one of another of our APIs would be much, much easier to do client-wise. In the end, I want people (Not the providers) to use the API as well...

Edit :

For the sake of the argument, I share a new screen of the test, in the same session, with the same Firefox version. check_time_for_dts_link_ _jsperf_-_2017-04-24_16 08 56

hcayless commented 7 years ago

I just wanted to see what the actual overhead of the Regex was, but then it wouldn't let me publish the fixed version, so I gave up :-). I still think this only proves that doing more work is more expensive, not that using link relations is bad.

balmas commented 7 years ago

My preference is for a layered API approach - allowing people to build a purely RESTful API on top of the Swagger API if that's what they want to do, but using Swagger/OpenAPI as the base for defining and describing the functionality. For better or worse, that's where we are now. We made an attempt to start with the pure RESTful approach and it stalled so I went the Swagger route and that's what we have.

I disagree with the point about not having predefined routes. The link relations approach is nice from a theoretical perspective, but in practice, I don't really believe that predefined routes are difficult to implement, support or write clients to.

To borrow from the IIIF terminology, we might think of what we have right now in the Collections API swagger spec as a collection 'manifest'. It describes the collection and its members, and allows for member items to describe their relationship to the parent collection using vocabulary that is specific to the publisher of the collection.

The only navigation that is really part of it at the moment is getting from a list of all collections to retrieving one collection and I think that's fairly non-controversial.

Where we are getting tripped up is in describing how to navigate into a collection for the passages and references API.

I think @PonteIneptique has demonstrated that using full URIs for the equivalent of the CTS GetValidReff command is probably not viable. I think there is more benchmarking needed to determine whether the relative path or pattern approach is reasonable. One issue I have with the benchmark tests is that I think for the use case that retrieves all of the references at once, it's probably not a case where you also need to do pattern replacement on all of those URIs - i.e if you're doing it to compute grouping, you only need to complete the pattern once you've done the grouping, so it's for a much smaller number of cases.

My preference would be to move forward by looking at how to use OpenAPI 3.0 to describe navigation into the collection, taking advantage of the pattern matching it offers. I will volunteer to make a proposal using that in the coming days, probably not by this week's standup, but definitely by the next one (i.e. by May 3rd).

I do think that requiring a pure RESTful approach would be the point at which I jump off the effort. Or at least, I would step back and let someone else take the lead on defining what that would look like both for the collections API and the passages/references APIs.

PonteIneptique commented 7 years ago

@hcayless As a potential heavy user of such an API, 1/8 or 1/40 ratio is a pretty big deal to me. Call me crazy :') Though, I do agree with you that the site is awful when it comes to edit :+1:

Until @jonathanrobie shows his proposal, I'll not comment much more. I do agree with @balmas on some points. I actually still think that full or relative URI on collections is gonna be as expensive, albeit there will be less data to parse (The way @hcayless implementation and mine compare, there will ALWAYS be a 1/8 ratio, ratio that will grow on small amount on data because of dict access. But could be ignored given its importance.).

I am all for one thing : a light to parse, light to produce/transmit API. If it becomes much heavier over philosophical points, I am not sure I'll be interested as a user (given that I won't be a provider anymore :) ).

jonathanrobie commented 7 years ago

I would probably support all queries against a given resource using query parameters, the query string and the format for the result of a query would depend a lot on the semantics of the query.

I would probably reserve ?q= for full text queries, and support keyword/pairs for queries on properties of the metadata. I might well choose to use ?search= as a prefix for such searches. I don't know what property identifies a reference, let's call that ref. I would support wildcards.

If you support keyword/pair without search=, you would use the = sign as a delimiter. So ref=* would do the trick. This means that using = in the keyword/pair sequence is a little weird:

http://url.to.resource/?search=ref=*,unit=chapter

But it works. One way around that is to use functional notation:

http://url.to.resource/?search=ref(*),unit(chapter)

In either case, the API would provide a URI for searches, with a link relation like 'search', and another for full-text queries.

In your result set, you could use any of the formats you showed, it is the result set of a query, not an API. The API will certainly need a way to resolve a resource by reference, which would use a URI template to allow the user to provide the resource.

There isn't a difference in performance, because what goes over the wire can be the same. The difference is in the discoverability of the API.

Jonathan

Suppose {URL} points to a resource. Then {URL}?search=ref

On Thu, Apr 20, 2017 at 11:31 AM, Thibault Clérice <notifications@github.com

wrote:

My use cases are :

  1. Ask what references are available for a collection
  2. Ask for all instances of these references in a query (would a user really do that?)

Yes they would

jonathanrobie commented 7 years ago

On Mon, Apr 24, 2017 at 10:11 AM, Bridget Almas notifications@github.com wrote:

My preference would be to move forward by looking at how to use OpenAPI 3.0 to describe navigation into the collection, taking advantage of the pattern matching it offers. I will volunteer to make a proposal using that in the coming days, probably not by this week's standup, but definitely by the next one (i.e. by May 3rd).

I do think that requiring a pure RESTful approach would be the point at which I jump off the effort. Or at least, I would step back and let someone else take the lead on defining what that would look like both for the collections API and the passages/references APIs.

It's easy enough to define a purely RESTful API on top of a Swagger API, that might be the best approach. Any approach that makes Bridget want to drop out is a non-starter.

But we need to understand what the contract is between the client and the server. If we layer it, can a client rely on the RESTful API, or will some servers not bother to implement it? Can a server choose to implement the RESTful API but not guarantee to support hard-coded endpoints in order to avoid versioning headaches down the road?

PonteIneptique commented 7 years ago

Well, I definitely think this proposal is out of what I'd like to support, by completely negating the access via unified routes and routes that represents objects. ie, Collections has references, Collections has readable passages.

I think RESTful and this kind of models are out of scope from me. We are going really far away from a unified API that would be easy to communicate with and standard it its answers. CTS is in the end fitting much more my needs.

If the result of this effort is everyone implement its own routes, its own prefixing or not prefixing system, its own full response, I won't support it much further, because I won't take the time to write such a client and support it over time.

balmas commented 7 years ago

First, I don't think my participation should be the touchstone by which any decisions are made. The needs of the community of users and developers should be the main issue here. I'm not sure I qualify as either user or developer of DTS right now.

We started this effort because of these primary problems with the CTS model:

1) it didn't allow for text collections which couldn't adhere to the rigid CTS textgroup/work/edition model

2) the XML overhead of the request/responses was too high, particularly for the GetValidReff and GetCapabilities calls

3) the routes were not RESTful

I think we didn't all have same idea about what the last point meant -- it does seem clear now that for some of us the priority was for more web-friendly routes whereas for others it's the full HATEOS approach that is critical.

It could be that there is no meeting of the minds to be had. I am hopeful that OpenAPI 3.0 might offer us a middle ground. Most compromises leave everyone at least a little unhappy though.

hcayless commented 7 years ago

So thought experiment: if my collections API, when it reaches an edition (i.e. something you can grab passages from, could be a work with a default edition), has a) a link relation that says "go here for valid references on this work", and b) a link relation that says "query this link with a reference to retrieve a passage", and a) gives you a list of references that you can plug into b), have we suddenly become horrible?

I don't think the passage API can be super RESTful in a way, because it's really about querying, just with a contextually-constrained "language" (e.g. you can have ranges, which aren't something you'd get from your big list of references, and what you can query depends on the structure of the work). But I do think the collections API is very amenable to the RESTful treatment and I don't want to give that up.

jonathanrobie commented 7 years ago

I think Hugh has just said what I'm trying to say.

There's a part of the passage API that will be RESTful by providing next / prev links and such, but a lot of the action will be done by querying. I'm wondering if the real problem is people need to see a concrete API before we all know we are talking about the same thing. Perhaps it would make more sense to write a RESTful API on top of the OpenAPI spec, show how it would be used for a RESTful client, and then discuss?

Or perhaps it would be even more helpful to look at existing REST specifications designed to solve this problem? After all, if we want to be RESTful, the location URLs do not drive the design. If we choose a media type that supports collections natively and has a set of well-known link relations for navigating collections, we can reduce our work significantly. We then need to add link relations for additional functionality we define that is specific to our work.

RESTful collections are quite common these days, in a lot of applications, and we can get a lot for free. If we are using json-ld as our media type, we could use Hydra as our guide:

http://www.hydra-cg.com/spec/latest/core/

If we want to simplify that, we could use JSON:API as our guide:

http://jsonapi.org/format/

Or if you want an older approach that lacks a few features, here is HAL:

https://apigility.org/documentation/api-primer/halprimer

Each of these has native support for collections, and they are being used in serious applications. If we use one of these are our media type, we reduce the number of decisions we need to make and the complexity of making those decisions.

Jonathan

On Mon, Apr 24, 2017 at 2:11 PM, Hugh A. Cayless notifications@github.com wrote:

So thought experiment: if my collections API, when it reaches an edition (i.e. something you can grab passages from, could be a work with a default edition), has a) a link relation that says "go here for valid references on this work", and b) a link relation that says "query this link with a reference to retrieve a passage", and a) gives you a list of references that you can plug into b), have we suddenly become horrible?

I don't think the passage API can be super RESTful in a way, because it's really about querying, just with a contextually-constrained "language" (e.g. you can have ranges, which aren't something you'd get from your big list of references, and what you can query depends on the structure of the work). But I do think the collections API is very amenable to the RESTful treatment and I don't want to give that up.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/distributed-text-services/collection-api/issues/57#issuecomment-296776022, or mute the thread https://github.com/notifications/unsubscribe-auth/ABr5vQ2av1mGk94HJ_aPQ-7gKEA67kWZks5rzOXigaJpZM4MtPHH .

jonathanrobie commented 7 years ago

If I had to choose between Bridget dropping out and me dropping out, I would rather drop out. My goal was to be helpful, not to get in the way or insist on my own way.

I will provide an API for things I am working on now, it will support collections of texts, allowing people to browse them and query them. It does not have to be the same API as DTS. My API will be RESTful, my time at EMC has convinced me that this is a better way if you have servers and clients that each have to evolve over time, not necessarily at the same time. And I think that's what we have here. It is certainly what I will have on my server.

For my purposes, starting with an existing hypermedia media type like Hydra or JSON:API would work great. I think it would also meet the three requirements Bridget gives. If the group decides not to go that way, I would ask that we make a clear decision. I'm not sure if I would drop out or not - I still might wind up being a user of DTS, but probably would not implement it in that case.

Jonathan

On Mon, Apr 24, 2017 at 1:08 PM, Bridget Almas notifications@github.com wrote:

First, I don't think my participation should be the touchstone by which any decisions are made. The needs of the community of users and developers should be the main issue here. I'm not sure I qualify as either user or developer of DTS right now.

We started this effort because of these primary problems with the CTS model:

1.

it didn't allow for text collections which couldn't adhere to the rigid CTS textgroup/work/edition model 2.

the XML overhead of the request/responses was too high, particularly for the GetValidReff and GetCapabilities calls 3.

the routes were not RESTful

I think we didn't all have same idea about what the last point meant -- it does seem clear now that for some of us the priority was for more web-friendly routes whereas for others it's the full HATEOS approach that is critical.

It could be that there is no meeting of the minds to be had. I am hopeful that OpenAPI 3.0 might offer us a middle ground. Most compromises leave everyone at least a little unhappy though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/distributed-text-services/collection-api/issues/57#issuecomment-296748610, or mute the thread https://github.com/notifications/unsubscribe-auth/ABr5vT9mBV3E5GYAPTROxVCjacDLRyyFks5rzNcPgaJpZM4MtPHH .

balmas commented 7 years ago

The suggestion by @hcayless seems reasonable to me and that is essentially what I was going to try to represent with OpenApi 3.0.

jonathanrobie commented 7 years ago

I'm still a little confused about what people are doing when they ask for all valid references.

If I wanted all references in the New Testament, I would ask for a list of books. For each book, I would ask for the number of the last chapter. For each chapter, I would ask for the number of the last verse. That's because the hierarchy in the New Testament is book, chapter, verse.

Does that same strategy work reasonably well for most reference systems? I suppose it might not for Library of Congress numbers - or would it? Are there other systems where this kind of strategy does not work?

I'm mostly saying that it's helpful to really understand the use case before we decide what is most efficient. I imagine some of you understand this a lot better than I do at this point.

Jonathan

On Mon, Apr 24, 2017 at 5:34 PM, Bridget Almas notifications@github.com wrote:

The suggestion by @hcayless https://github.com/hcayless seems reasonable to me and that is essentially what I was going to try to represent with OpenApi 3.0.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/distributed-text-services/collection-api/issues/57#issuecomment-296828939, or mute the thread https://github.com/notifications/unsubscribe-auth/ABr5vQ8yNg2VG5kouYiw3OIzBXZDSGRCks5rzRVpgaJpZM4MtPHH .

hcayless commented 7 years ago

I think one motivator for this is that, particularly in the case of specific editions, the citation scheme may not be totally regular. An edition may change the order of lines, or mark interpolated or repositioned lines as 1a, 1b, etc., or start at line 10 (or do the same with larger units). So it's not necessarily enough to know the citation scheme and end point—you may actually need a list of all the referenceable units in order. This is a pretty common circumstance. Along with this issue (and because of it), if you want to align two editions, you'll probably need to proceed with full lists of references for both.

PonteIneptique commented 7 years ago

Okay, so with the current situation, would people be happy with the following compromise :

  1. The DTS API has standard routes / URI. The location of the DTS API is depending on the implementer (eg /dts, /api/dts, /api/dts/1.0, /api, /text-api) but the routes starting after this one are fixed (eg /dts would have - if we agree on these routes - /dts/collections, /dts/references, /dts/passages )
  2. The DTS API has full links (can use prefix in the context of JSON-LD) on the /collections API but not on the /references and /passages. (to me, this would mean losing coherence, but if that's what it takes to get 1, I'll go with it)
  3. The DTS API has short, light answer and not URI when it comes to references. (Kind of a repetition of 2. though.)

I think to me 3. and 1. are the most important. I can give up on the full links on the /collection API if that leaves those two points open.

PonteIneptique commented 7 years ago

(on another note, @jonathanrobie , as @hcayless said, what you envision is something that would only be numeric and incremental. A lot of system have alphenumeric citation system. In Perseus, we have some "pr" poems, we have inversed numbers [a lot in drama], etc. :) )

hcayless commented 7 years ago

@PonteIneptique Re your point 1, I can think of a couple of objections: first, I think we're already starting to see that the three APIs may work in different ways: collections is about browsing, references is about discovering the internal structure of a document, and passages is about identifier (and reference) resolution. Do we know they'll want similar kinds of endpoint? I'm not so sure... This is also why I think we'd not necessarily be inconsistent if we adopt this way of doing things.

Second, your fixed routes rule out certain kinds of implementation. When I worked at the UNC Library, the systems team mostly wouldn't let us have new rewrite rules—so my endpoint would have had to be collections.php or something like that (I will happily stipulate that this was mad, but nonetheless it was an operational constraint). I don't see why that sort of implementation style should be illegal. An implementation might equally well defer passage lookup to a different service, so maybe it's on a different port or virtual host than the collections service. Again, I don't see why that's bad. Your suggested routes look perfectly fine, and I might expect a reference implementation to look like that, but I'm not sure why they need to be mandatory.

PonteIneptique commented 7 years ago

@hcayless My point is that a user want to work with standards API without having to figure where is the route for passage or references. If people do not want passage and reference, then I do not know why they are here, because DTS was driven through the need to adapt CTS for more structure, as stated by @balmas

When I know someone has a CTS API, I go there, I know what request to do, I do not need to read the exact changes that this person did to the original standard. Technically, so does IIIF by saying "here is how a transformation route should be implemented"

To me, this compromise is the minimal base I can accept as a user. I think that if we do not meet this really minimal base (I gave up a large chunk for this consensus), I'll simply leave the project.

Addendum : Do not take that as blackmail or the likes. It would just mean that my goals as a library provider and CMS provider would not be aligned anymore with those of the project. And as such, it would make no sense at all for me to continue to spend time on this :)

hcayless commented 7 years ago

If we were going to be all drawing lines in the sand about this, it's a shame it took this long to get there. For my part, CTS is pretty useless for the sorts of material I work with, so I suppose I can go back to ignoring it.

PonteIneptique commented 7 years ago

@hcayless I agree. I just would not have thought that having structured and standard URI would be so much trouble for some of us.

I'd just add that removing myself from this does not mean it can't continue. It seems that at least 3 of you are agreeing with each others. Dunno for Bridget.

I also think it is a shame that we lost so many people on the path down there because their voice would have been an interesting factor to weight in.

balmas commented 7 years ago

I would really hate to see all the effort we've put in so far go to waste over philosophical differences.

balmas commented 7 years ago

I really would like a chance to see if we can use OpenAPI 3.0 to get mostly where we need to. The route question is one that I would like a little time to think about more.

jonathanrobie commented 7 years ago

Lets go back to the requirements. Bridget listed them as follows:

We started this effort because of these primary problems with the CTS model:

  1. it didn't allow for text collections which couldn't adhere to the rigid CTS textgroup/work/edition model

  2. the XML overhead of the request/responses was too high, particularly for the GetValidReff and GetCapabilities calls

  3. the routes were not RESTful

From my perspective, if we replace the rigid CTS model with rigid URLs of our own, that means we really haven't improved over #1, because people want to organize collections in a wide variety of ways, and implementation constraints may affect the URIs a particular server wants to offer.

From my perspective, if the real API is about knowing the structure of the URIs because you read the documentation somewhere and going there directly instead of finding the URIs via discovery, that's rejecting requirement #3.

Meeting only requirement #2 is an improvement, I guess, but not enough of one to make me want to implement the API. I think we are basically asking if we believe in the original requirements or not. I still do.

PonteIneptique commented 7 years ago
  1. I do not see how rigid URLs would be a problem for organizing collections as you like. I'd be happy to have a user story proving this point.
  2. It might be rejecting 3, as you see RESTful, but it also was a GREAT strength of CTS. Today, with tools like MyCapytain, I don't have to browse a lot of pages to get what I want. And it makes sense. I don't know a lot of people that browse API like webpage. Maybe I don't know the good ones :)

Most probably my last comment because I am fed up with this and being told that I cannot possibly understand things. I want to underline the constructiveness comments made here : "Lets go back to the requirements.", "I don't think anyone here would design the API the way that you did in your example.". I cannot possibly have a nice discussion when I am either treated as an idiot or as a newcomer.

I want to remind that we came to make CTS lighter and more compatible with all of our collections. We were between 10 and 15 people. Today, we are only 5 to speak, and in this discussion only 4 to discuss things a lot. The overly technical turn, with its English expression, has excluded a lot of people. My experience (only mine) is that strict standards are much easier to include, because most of our domain (DH) do not have really advanced engineers and clear, imposed, standards works well (as did CTS, and even CTS was sometime implemented wrongly because of the lack of clarity on the sense of some items).

I spent a lot of time on this DTS project, including on writing an export for MyCapytain, on writing benchmark, but it feels like it's not worth much. I also spent a lot of time implementing clients for projects, and in my experience again, I do not see how DTS, by going down this road, will be easy to implement on the client perspective. I have tried to argue about it, but it seems it's dismissed. I really feel like only the provider point of view is taken into account here, while the point of having standards API is also to get them reused. We do not have the fire power of W3C / IIIF, and nor should we think we have. But even if we had, text transformation in IIIF is a standard route.

jonathanrobie commented 7 years ago

First off, let me be clear: if we don't have a common set of goals, I can drop out of the group and let the rest of you go ahead. RESTful APIs are not the only good APIs, but I'm not likely to implement an API that is not RESTful.

Use cases for structuring my own URIs: adding metadata to URLs to track users, measure performance, allow resources to be personalized for users, provide data like primary keys or other metadata useful for an implementation in a given environment to look things up. One of the most important use cases is versioning - as the API is versioned, the URIs are often changed, and this can be done without affecting the user if the API is RESTful. I may well have some users working with one version of a resource and others working with another version - perhaps some users have an old client that was written for a server three versions back, and the server has evolved since then, so it uses a different URI to represent the newer version of the resource.

A client can always cache URIs, and a system tries not to invalidate them.

On Wed, Apr 26, 2017 at 4:57 AM, Thibault Clérice notifications@github.com wrote:

Most probably my last comment because I am fed up with this and being told

that I cannot possibly understand things. I want to underline the constructiveness comments made here : "Lets go back to the requirements.", "I don't think anyone here would design the API the way that you did in your example.". I cannot possibly have a nice discussion when I am either treated as an idiot or as a newcomer.

I really did not mean either of these comments that way. I respect the work you are doing, and you are not a newcomer at all. I'm newer to the project than you are, and you are doing most of the work. I thought we had agreed on the requirements, I don't like changing them this late in the process, and I don't yet see a compelling reason to change the requirements. But that's my opinion, and if I'm in the way, getting out of the way might be constructive. When I point to the requirements, I'm not treating you like a newcomer, I'm treating you as a party to an agreement I thought we had made.

When I clarify that I would never implement the API the way you did in your example, that's because your example claimed to prove that RESTful APIs are bad because they would look like that example. That example does not look like an API I would advocate or design. It does not prove that a RESTful design will have the properties of your example.

I want to remind that we came to make CTS lighter and more compatible with all of our collections. We were between 10 and 15 people. Today, we are only 5 to speak, and in this discussion only 4 to discuss things a lot. The overly technical turn, with its English expression, has excluded a lot of people. My experience (only mine) is that strict standards are much easier to include, because most of our domain (DH) do not have really advanced engineers and clear, imposed, standards works well (as did CTS, and even CTS was sometime implemented wrongly because of the lack of clarity on the sense of some items).

Are you suggesting it would be best to have some people design APIs to be evaluated rather than discussing these things in terms of principles? That might well make sense. I agree that a standard that is clearly and precisely specified is important.

I spent a lot of time on this DTS project, including on writing an export for MyCapytain, on writing benchmark, but it feels like it's not worth much. I also spent a lot of time implementing clients for projects, and in my experience again, I do not see how DTS, by going down this road, will be easy to implement on the client perspective. I have tried to argue about it, but it seems it's dismissed. I really feel like only the provider point of view is taken into account here, while the point of having standards API is also to get them reused. We do not have the fire power of W3C / IIIF, and nor should we think we have. But even if we had, text transformation in IIIF is a standard route.

Again, I can get out of the way if the group feels that would be helpful. I may well have a different design sense than you do. But I think the real issue is not that we have problems with the parts of the API that have already been designed, but that the rest needs to be designed.

If I get out of the way, there are no hard feelings, and I would be happy to work with you or anyone else on this group on some other project. It would just be a recognition that we want different things in our APIs.

Jonathan

PonteIneptique commented 6 years ago

This debate was fixed a long time ago .