ConsumerDataStandardsAustralia / standards

Work space for data standards development in Australia under the Consumer Data Right regime
Other
319 stars 56 forks source link

Decision Proposal 008 - Use Of Pluralisation #8

Closed JamesMBligh closed 5 years ago

JamesMBligh commented 6 years ago

This decision proposal outlines options for the handling of collections and records in API URIs. Insight from the community is being sought due to the trade offs between the identified options.

Feedback is now open for this proposal. Feedback is planned to be closed on the 14th September. Decision Proposal 008 - Use Of Pluralisation.pdf

deboraelkin2 commented 6 years ago

I think Option 1 is preferable. It follows the JSONApi.org recommendations and is easily understood by developers. Options 2 and 3 introduce distinctions in resource paths with the purpose of separating requests of attributes for individual records and collections of records. This can actually be achieved with standard model (Option 1):

Resource Path Description
GET …\accounts Returns an array of accounts
GET …\accounts\{id} Returns the detail of a specific account
GET …\accounts\transactions Returns the transactions of multiple accounts
GET …\accounts\netposition Returns the net position of multiple accounts
GET …\accounts\{id}\transactions Returns the transactions of a specific account

To address issue 1, only the most commonly used attributes could be represented as a specific subpath (eg: transactions, balance), all other requested attributes could be specified using a query parameter. For example:

Resource Path Description
GET …\accounts?q=netposition Returns the net position of multiple accounts
GET …\accounts\{id}?q=netposition Returns the net position of a specific account
onereddogmedia commented 6 years ago

I also prefer Option 1, and it is the option we have taken in our own API design. Option 2 adds an extra redundant path to the URI. Option 3 can be confusing.

da-banking commented 6 years ago

Option 1 is preferable, for reasons already cited by @deboraelkin2.

Issue 3 It’s unclear whether the api specs will require the complex filtering stated.

Using a POST to perform a query would be undesirable in our view. HTTP specs do not allow the response to a POST to be cached as the operation is not considered idempotent. Using a GET along with an ETag, would allow the server to respond with 304 Not Modified, and signal the client to use a cached copy.

This kind of opt-in caching consideration will be important if we are to synchronise entities between data providers and data consumers, while constraining expensive resources at both ends. If data consumers try to load a complete transaction history 4x per day (and they will want the most current data), even if nothing has changed, this would be represent dramatically higher data processing requirements on data providers than the current PSU behaviour on digital and mobile channels.

We would prefer the use of GET for all idempotent queries. If complex filtering is indeed required, then a more complex query string should be used. There are many API examples like this I.e AWS CloudSearch queries: https://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-compound-queries.html

@JamesMBligh - on the subject of collections, will there be any discussion on paging? Some of these collections could be very large, and data consumers not wishing to store and process the whole amount, would benefit from requesting the data a page at a time. It would be good to avoid data providers serving large payloads that are mostly discarded.

onereddogmedia commented 6 years ago

I agree with @da-banking. A POST method should be used to change (write) the state of a resource. Creating complex queries via some JSON payload causes a conflict with the resource. Has anyone consider doing that via another method, e.g OPTIONS? Might be a bit exotic, just an off the cuff thought.

I like the idea of paging via a query parameter: ?page=1&pagesize=20

JamesMBligh commented 6 years ago

Some comments on the feedback so far...

Plural Options @deboraelkin2, your proposed approach has the flaw that a static string is at the same level in the URI as an arbitrarily variable string (ie {id} is at the same level as transactions and net position). This is problem as it is technically possible for an ID to have the same value as one of the static strings. Many platforms would be able to deal with this through prioritisation of routes but not all platforms would be able to do this. It also introduces problems with the application of security profiles that are often applied to URI patterns. I do not believe this approach is appropriate.

Query Parameters The use of query parameters to specific what is really a resource (ie ?q=netposition) breaks the resource orientation principle we defined in decision 1. Mixing resource identification between the URI and Query Parameters could easily result in a highly complex API set. I am not convinced this is a preferable solution to the other options put forward.

Complex Queries I believe there are a number of scenarios where complex queries will be warranted. This is particularly true when considering that business customers are in scope. Presenting visualizations for business customers that could potentially have dozens (or even hundreds) of accounts for different purposes results in a need for complexity that can not be accommodated in the URI due to max length constraints. I have encountered this before in real world situations. Another example is balance refresh. Balance is a highly volatile attribute so caching is less relevant but there is often a need to request new balances for specific accounts after a transaction has occurred. The ability to POST a list of IDs to retrieve balance for is a real use case. The cache and performance considerations are acknowledged. I would note, however, that the HTTP protocol is not the only way caching can be performed. Providers can, and should, be caching the results of complex queries and even pre-fetching results in some cases. The performance concerns can potentially be dealt with under NFRs. If a complex POST query is warranted it is possible to recommend specific NFRs for that end point if there are concerns around misuse by consumers.

At this stage, even if we can avoid complex queries via POST (which would be ideal) I am disinclined to implement a standard that precludes them as an option.

On a more personal and subjective note, having looked at a number of implementations where complex queries are embedded in the Query (including the cloud search implementation referenced) I’m not sure how an approach such as this would be considered preferable to a JSON document.

Paging Yes, paging behaviour will be specified. I was planning to accommodate this in the payload proposal. As it is getting a lot of attention I may break it out into a separate decision. It is probably a complex enough topic to deserve a dedicated thread.

Developer Experience Dev experience is really important and this design point does have implications in this area. I understand devs would be more familiar with the JSONAPI recommendation but I have worked on APIs that have used alternative pluralisation models and developers have not found it to be a concern. I would be interested to hear specific areas where this could be a problem for this decision, however.

-JB-

da-banking commented 6 years ago

@JamesMBligh - the way that the open banking standards are defined will have real economic consequences for data providers and data consumers. The subject of caching is more nuanced than you've stated.

Providers can, and should, be caching the results of complex queries and even pre-fetching results in some cases.

No doubt data providers will implement server-side caches if that is more cost-effective than live querying each time. This is a good approach to reduce the resources required on a core banking system to serve requests. However a server-side cache will not address the bandwidth requirements to return the data to data consumers.

A client-side cache can address this, and helps the overall solution scale in performance, and cost.

At this stage, even if we can avoid complex queries via POST (which would be ideal) I am disinclined to implement a standard that precludes them as an option.

Your statement suggests that the intention is to use a POST in an exceptional use case. We would agree on the need to be pragmatic. In each case, there should be very compelling reasons to select a POST over a GET for querying data, but there does seem to be conceivable reasons when a POST might be the better choice.

However, if the suggestion is that in general a POST will be used to query collection resources, we believe that the loss of client-side caching on the most bandwidth intensive queries (collections) is a very high price to pay for this facility, especially without a compelling use case defined.

For example, with client-side caching, a business-banking client might call the general /accounts call passing the client-side cache key for the list of hundreds of accounts previously retrieved in the ETag header:

GET /accounts HTTP/1.1
Accept: application/json
ETag: 23e46ca5-a099-4c16-aa27-b80827f64136

And the server could validate that none of the resources had changed and respond with 304 Not Modified, and transfer no payload to the client, saving bandwidth. The client could then filter the cached accounts list it had stored on the client-side any way it wanted.

This kind of approach would allow the data standards to only focus on supporting the most common simple filtering on the server-side, which would address another concern: complex server-side queries require dedicated indexes to perform well. Costs to data providers could be substantial if there are many distinct ways to filter queries on the server-side - more indexes trade-off query performance against write performance.

We are aware that the discussion of caching is somewhat off topic for this thread. And happy to have a more focused discussion on a new thread.

JamesMBligh commented 6 years ago

@da-banking, I think we are aligned. Precluding the scenario where a POST could be used for querying would be constraining at this stage but this pattern should only be used where absolutely necessary. GETs should be primarily used for queries. Support for client-side caching (and compression) should be facilitated with implementation being at provider discretion.

-JB-

ajohanssonwalder commented 6 years ago

I agree with Option 1, with caveats. I agree that there may be scenarios where a POST for query purposes makes sense (as in your example of requesting an aggregate result over a large list of accounts), perhaps as an exception to the rule. - however this piece probably isn't part of this decision.

The issue of having static 'reserved' strings under a resource collection such as GET /accounts/balance doesn't feel like a large concern for service providers, so long as the peer /accounts/{id} variable is declared with a known semantic format. ie. uuid regex's [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} or similar, etc. This should require it to be determined as part of the standard, however.

Regarding the outlined issues, there could be:

Allowing for resource collections to have collective subresouces, while maintaining singletons within the collection seems feasible provided they are noted as reserved words with no possibility to collide with a resource id.

Option 2 - introducing an id keyword feels a little unnecessary, and could still have the same issue as described in previous comments.

I'm interested to know of any products or services that would not be able to distinguish between static (/accounts/balance) and formatted wildcard (/accounts/{id:format}) routes, if they do in fact present an issue, from misconfiguration of path match priorities. Exact matches should obviously take priority.

If the format of identifiers doesn't intend to be apart of the standards (bobs-named-account, or uuid etc), or route collision is still in fact an issue, then Option 3 largely seems ok.

NationalAustraliaBank commented 6 years ago

Option 1 is our preferred option as it keeps the URI consistent between a single resource and the collection it belongs to.

Option 2 and Option 3 seem inconsistent and awkward from a consumer stand point. Perhaps there is a flavour of Option 1 which is more intuitive?

In the example that was discussed "netPosition" could either be a property within a single account object, or "netPosition" could be a calculated field across the collection of accounts using some agreed calculation.

In the case that its a calculated field across a collection, this could be treated as a new resource; examples called out in the table below. Each resource object would then have related links out to their respective detailed objects.

Resource Path Description
GET …\accounts Returns a collection of accounts objects [OK]
GET …\accounts\{id} Returns a single object relating to a specific account [OK]
GET …\accounts\transactions Returns a collection of transaction objects of multiple accounts [OK] (ignoring pagination and sorting concerns in this post)
GET …\accounts\netpositions Returns a collection of net position objects across multiple accounts, this is treated as a new resource, so we can keep on following the same consistent logic.
GET …\accounts{id}\netposition Returns the net position object of a single account, this is treated as a location within a particular resource, so does not need to be a plural.
GET …\accounts{id}\transactions Returns the transaction objects of a specific account

Using a POST to perform a queries in not preferred as a default option, but rather as an alternative in the event that a real use case presents itself. Then, if this were to eventuate the resource URI would require a static string explicitly calling out that it does not behave the same way as the other resources.

JamesMBligh commented 6 years ago

The feedback is fairly consistent that option 1 is the approach to take so that is where we will likely land. I confess that I am concerned that, as we proceed, we may encounter issues as the API set encompasses other industries and more complex scenarios but, as we are doing end point versioning, we can pivot if issues are encountered.

To be specific, the implications of this decision will be:

I’ll be closing the proposal tonight so if there is an opportunity for some final feedback if anyone has any.

-JB-

bazzat commented 6 years ago

The consensus of the ABA Online Banking Technical Working Group is that we support Option 1.

TKCOBA commented 6 years ago

COBA supports Option 1, particularly as it would appear most consistent with the API Principles (particularly Principle 2: APIs use open standards).

JamesMBligh commented 6 years ago

I have now closed consultation on this decision. A recommendation incorporating feedback will be made to the Chair in due course.

-JB-

JamesMBligh commented 5 years ago

The finalised decision for this topic has been endorsed. Please refer to the attached document. Decision 008 - Use Of Pluralisation.pdf

-JB-