ConsumerDataStandardsAustralia / standards

Work space for data standards development in Australia under the Consumer Data Right regime
Other
319 stars 56 forks source link

Decision Proposal 028 - Transaction Payloads #28

Closed JamesMBligh closed 5 years ago

JamesMBligh commented 6 years ago

This decision proposal outlines a recommendation for the payloads for transactions as per the end points defined in decision proposal 015.

Feedback is now open for this proposal. Feedback is planned to be closed on the 26th October. Decision Proposal 028 - Transaction Payloads.pdf

Please note the specific concerns. If there is significant early feedback I will reissue with amendments prior to the closure date.

da-banking commented 5 years ago

transactionId Making the transactionId mandatory requires the data provider to maintain a permanent ID mapping per transaction, per consumer, per data recipient. This is a considerable overhead, especially when the 7 year retention requirement is considered.

It is also impossible for the data recipient to determine if a transaction has additonal detail available, triggering a large number of needless calls to GET /banking/accounts/{accountId}/transactions/{transactionId} to determine if there is additional detail.

NPP adoption is not mandatory and seems to have been enabled by about half of ADIs. Among organisations that have adopted NPP, the substantial majority of transactions today do not occur through NPP, but through the card switch networks (ATMs and EFPOS), and batch systems.

Therefore the substantial majority of transactions will not have any additional data available via the GET /banking/accounts/{accountId}/transactions/{transactionId} path.

It would be a mistake to add an overhead of this size to something that is relevant to only a small fraction of transactions.

We recommend:

This avoids storing the mapping of internal transaction identifiers needlessly, and it avoids having a data recipient call GET /banking/accounts/{accountId}/transactions/{transactionId} for every transaction, just in case there are additional details.

Filtering It seems to leave it up to the data provider to determine which transactions to include within the start-date and end-date range specified by the data recipient. Data providers may be in different time-zones to each other, the data consumers, and the end user themselves.

We recommend that date filters like start-date and end-date should be specified as datetime UTC to make it clear to the data provider what point in time the data consumer would like to see data for.

The inclusion of a text filter defined as a "where this string value is found as a substring of either the reference or description fields" is an implementation concern. We could efficiently find transactions where the reference or descriptions starts with the text value, however a contains query cannot use a b-tree index. If retained in its current form, this requirement will be costly to implement. Additionally it should be made explicit that the consumer would probably expect this to be a case-insensitive query.

Ordering The standards should specify the order that the transaction data must be returned in. These are large data sets, and sorting is expensive, so we do not think that sorting should be a query parameter with multiple options that data providers must support.

Since the transaction data is created in order of postDateTime, the start-date and end-date parameters are for posting dates, and many data consumers will be using the APIs to retrieve a delta of transactions that have occurred since the last time they checked, we believe that the order should be postDateTime DESC.

Specific Areas Of Concern

Has the NPP extended information been modelled correctly?

The use of the term extendedData when this applies only to NPP messages seems to be leaving the way open for this object to be used for other purposes as yet undefined. Additionally, NPP only supports the x2p1 overlay at present, with other overlays in the works. It is unclear what interesting data might be made available from other overlays.

It might be better to introduce a transaction$type which can have the enum value npp-x2p1 currently, and then have an npp object that has the from, to and extendedDescription properties.

This could then be extended to include npp-x2p2, npp-x2p3, efpos, and 'bpay' etc.

In terms of feedback on the detail modelled, for NPP payments from an account:

For incoming NPP payments into an account:

For bulk transactions that are filtered and paged the payloads interleave transactions from multiple accounts. Is this the best way to represent this data?

We don't see any alternative way - if the transaction data was grouped under account objects, then paging would be broken, and it may as well be using the GET /banking/accounts/{accountId}/transactions endpoint at that point.

We think your concern is symptomatic of a different issue. The bulk methods (GET and POST) have limited utility, and add complexity in implementation for data providers and data consumers. The bulk methods reduce round-trips, but since most consumers have very few active accounts, these methods are advantageous only for a tiny minority of consumers that have a large number of accounts. If a consumer has 3 accounts with 100 transactions in each, it is broadly the same if they are being retrieved 25 at a time with 12 calls to a bulk endpoint, or 4 calls each to 3 account endpoints.

We recommend that the bulk methods be made optional for data providers as they are in the UK specifications.

We would prefer to implement GET /banking/accounts/{accountId}/transactions, along with ETag or Last-Modified HTTP headers so that data consumers can efficiently determine if there are new transactions available for a specific account.

Is this the extent of transaction fields to include? What additional meta data or fields would be valuable and viable to include?

Creditor/Debitor account details We would expect the creditor account details to be provided in the case of a credit transaction, and the debitor account details to be provided in the case of a debit transaction.

The UK specs include this information.

Without it we can't distinguish a transfer between the consumers own accounts, from a transfer between the consumer and an external party.

jh-a commented 5 years ago

@da-banking the question of transaction ids is a difficult one. The lack of mandatory transaction IDs introduces unnecessary challenges for Data Consumers. Without immutable and unique transaction IDs Data Consumers are forced to request the maximum number of a user’s transactions in order to guarantee the data they are using is correct; alternatively the Data Consumer is having to create matching logic which potentially does not match up with the Bank's matching logic, and this could create inaccurate data and serious user detriment.

A compromise position might be for, as a micro service at the API layer, a bank to hash a series of fields which (combined) present a sufficient level of immutability.

da-banking commented 5 years ago

@jh-a - we are comfortable including a hash if it is helpful. However, a hash can also be computed by the data consumer as well as the data provider. If the data provider computes it then the payload transferred between the data provider and the data consumer is increased. Leaving this to the discretion of data consumers would allow them to generate a hash only if they got value from it, and also make an appropriate trade-off between key-size and collision probability for their use case. So we would expect a hash to be added to data provider scope only if the hash needs to be computed from fields that are not in the payload.

The scope of Open Banking is for banks to make data available to consumers that the banks currently hold in a digital form. Adding immutable transaction ids in all cases is going beyond that, will be expensive to implement, and add performance overheads to operate.

We think the intention of the /banking/accounts/{accountId}/transactions/{transactionId} endpoint is to facilitate other future transaction detail (besides NPP) being included at that endpoint, and in that context the endpoint makes some sense. This is why we have proposed a conditional transactionId as a best attempt to support this.

We process millions of transactions daily. If each data consumer is retrieving transaction detail with a round-trip per transaction then this would generate significant overhead. These systems are designed to retrieve and display transaction history at the account or customer level - in line with the normal UX of a mobile/internet banking app, or a paper statement.

Our strong preference would be to eliminate the transactionId property altogether and drop the /banking/accounts/{accountId}/transactions/{transactionId} endpoint. Instead we would prefer to return the NPP or other detail in the transaction collection responses. We would also be open to the inclusion of an optional query string parameter include-detail= true|false on transaction collection endpoints, so that the data consumer could specify if the detail should be returned or not.

The UK specs only have transaction collection endpoints. They do not define a resource at the granularity of a single transaction.

jh-a commented 5 years ago

@da-banking thanks for the feedback. As to your points

I don't think it particularly matters where the hash is generated, provided the data provider can identify fields which, when hashed, demonstrate sufficiently immutability for the identification of transaction singularly - that is the challenge.

I don't agree with your point about equivalence between what is delivered to customers and open banking. An API is a machine interface, and thus must include characteristics which facilitate a machine interaction. Customers do not current receive json payloads, a range of headers in each message, consent ids - the list goes on but all of these features are necessities for the facilitation of machine interactions.

As to the performance overhead, my suggestion is predicated on the assumption that the ID (created by a hash of sufficiently immutable fields) would only be generated at the API layer by a micro service, when these transactions where called. Hence any overhead would be entirely dependent on the usage of any transactions endpoint.

I broadly agree that a single resource to return transaction IDs is probably superfluous, and that returning transactions as a collection with an ID in the body of each instance is a preferable and usable approach.

As to the UK specs (of which I am a joint author) the final version for adoption is at https://openbanking.atlassian.net/wiki/spaces/DZ/pages/641795939/Transactions+v3.0. The UK is currently considering approaches to the adoption of a unique and immutable transaction ID for posted transactions, following broad recognition of the challenge that transaction mutation presents to data consumers and customers alike. You can review the decision and supporting information at https://openbanking.atlassian.net/wiki/spaces/WOR/pages/720830465/CR-021+Read+APIs+Including+unique+and+immutable+transaction+ID+to+transactions

da-banking commented 5 years ago

@jh-a we are somewhat at cross purposes.

We were not suggesting there was an equivalence between what is delivered to customers and open banking. Just that where there is a deviation, we're heading into broader architectural changes, trade-offs, and incurring higher costs. As a rule of thumb, endpoints/payloads that align with a current user interaction will be easier for banks to support than novel ones, and banks are more likely to have the data proposed.

We don't have a unique immutable ID for each transaction, adding one is do-able, but non-trivial. We understand how useful this would be. We could use it too :-)

We don't have an analogous process for retrieving individual transaction records by a unique immutable ID. Adding this introduces performance trade-offs to other important existing processes that banks perform, so we're not keen to implement that without a good reason, and we've suggested an alternative that side-steps this concern.

So to summarise, we can have a transactionId that is unique and immutable and is fulfilling the role of the hash you suggested for merging, but we do not want to support GET /banking/accounts/{accountId}/transactions/{transactionId} and would prefer this detail was included on the other transaction history collection endpoints.

jh-a commented 5 years ago

@da-banking great! we're on the same page!

anzbankau commented 5 years ago

S/RC/RD (181026-1)

WestpacOpenBanking commented 5 years ago

We note that description, reference and a number of the NPP fields may contain sensitive information including e.g. medical information, place of employment or contact details. These details may be the details of a party who did not give consent but was entered by the consenting customer. Careful consideration needs to be given to security scopes and the consent process.

bazzat commented 5 years ago

In regards to bulk transactions; there is unanimous objection among the Financial Institutions to the implementation of a bulk account and transaction endpoint. The processing impact of aggregating the collective portfolio of a customers accounts and their subsequent transactions has the potential to cause large performance issues. To technically source, compose and return the size of the payload will far exceed the potential response requirements and greatly degrade the customer experience. Further, the presence of a 'bulk' method moves the onus to the Financial Institution to perform reporting and Personal Financial Management functions in lieu of the Third Parties who can create and manage these within their own customer base. The view of the ABA Working Group is for the Bulk Endpoint to be removed from the proposal and delivery of the July 1st 2019.

bazzat commented 5 years ago

The ABA Open Banking Technical Working Group notes and supports Westpac's comments above re runningBalance.

NationalAustraliaBank commented 5 years ago

NAB is broadly supportive of the intent of this proposal, the designs and structures are well considered, however implementing all the proposed end points as they are described will be a significant challenge, to compensate, the following amendments are proposed:

Privacy

Given the risks associated with sharing this data we explicitly object to the inclusion of reference and extendedDescription attributes.

Data

  1. the situation of having to specifically request the extended details for each transaction, when in fact that data may not exit at all.
  2. the scenario of a TPP calling the transactions detail endpoint recursively to display a page of transactions with enriched data.

End Points

In this scenario a corporate customer with 100+ accounts and thousands of transactions per account would still need to be returned, coupled with the other query parameters this end point can easily become exceedingly complicated. Not withstanding is already a challenge for a retail customers with only a few accounts.

We believe the intent of the " POST /banking/accounts/transactions/" end point was intended to address the problem articulated above in so far as we could limit the number of accounts that could be requested, however we believe that it too is still too complicated to be included within the phase 1 scope. As an industry we may be better off implementing fewer end points well, learning from and refining them before attempting the advanced use cases.

There is still too much uncertainty surrounding how the scopes and entitlements would facilitate which accounts a user may see or not with respect to these aggregated end points.

In summary we support:

- GET /banking/accounts/{accountId}/transactions (with stated modifications)
- GET /banking/accounts/transactions (optional for data providers)

We recommend de-scoping:

- GET /banking/accounts/{accountId}/transactions/{transactionId}
- POST /banking/accounts/transactions (delay, and leverage learnings from the associated GET end point)

Filters

To that end, and to expedite the supply of the data, we recommend:

commbankoss commented 5 years ago

API design

CommBank feels it may be inefficient to have a mandatory call to endpoint /banking/accounts/{accountId}/transactions/{transactionId} for fetching additional details of a transaction. A suggested alternative would be to use /banking/accounts/{accountId}/transactions endpoint to return both basic and additional details of transactions, with an input flag indicating whether more details are required or not. The additional details will be optional fields, and should be provided by data provider if exist.

Pagination

The usage of page and page-size in the request, together with the mandatory totalRecords, totalPages, and links in the response might not be ideal because: · It implicitly assumes that the transaction list does not change while it is being queried. If there are new transactions in between the queries, they will shift existing transactions to the next pages, which will then be returned again when users query the next page. · It forces a full transaction prefetch and count before returning any records. It is not efficient, especially when you are dealing with potentially up to 7 years history.

An alternative approach is to leverage a page token or cursor. The request needs to specify the page-size, and an optional next-page-token argument. On the first request, the next-page-token will be empty. In the response, if there are more transactions (more pages), a next-page-token is returned. Include this next-page-token in the subsequent request to get the next results. Keep doing this until there is an empty next-page-token. This ensures the result contains the full transaction list, excluding any new transactions happening when the first query was executed.

Bulk Transactions

The proposed ability to serve an aggregated/bulk view of transactions (/banking/accounts/transactions) to an end consumer may prove computational expensive and as a result technically difficult to implement. Commercial entitles may have hundreds of different accounts, and there is no mechanism is downstream systems to proved a sorted, and combined view for a specified customer. This data would likely need to be materialised just prior to the API gateway, introducing latency, and potentially introduce data quality issues. CommBank would suggestion this request is removed, and potential use cases are solved by aggregation on the client/TPP side.

Data Fields

We are aligned to the view shared by @WestpacOpenBanking in regards to the runningBalance field. We note the earlier justification of including a running balance based on the presence on a statement document. CommBank does not recommend using statements as a benchmark for what fields are included in the payload, due to the process of manufacturing statements being conducted as a batch process on a provided time-period, and is a factual representation of a balanced journal.

CommBank also suggest there should be a structured way of demonstrating whether a transaction is a debit or credit on the account. Currently a decimal value is not determinative, and as such an additional property may need to be included.

Additionally we share similar views to @NationalAustraliaBank and @WestpacOpenBanking in regards to the potential data leakage of PII through the transaction reference field, and would caution that safeguards are put in place to ensure that consumer privacy protections can be enforced.

TKCOBA commented 5 years ago

With respect to the references to “Reference/String/Optional”, which refers to a bank reference, we suggest that this be “conditional” or “mandatory”, as each originating bank transaction would have a reference.

JamesMBligh commented 5 years ago

I'm about to close the feedback period. Comments on feedback to date that will steer the final decision (for the draft standards) is below:

Transaction ID Feedback around making this conditional is understood. This change will be added. As the Transaction ID is useful to resolve the issue of transactions altering during paging and general de-duping so transaction calls do not need to be repeated it should not be used to indicate if more info is available. I will add an optional "isDetailAvailable" flag for this purpose.

Filter Dates Good point regarding timezones and which date is specified for filtering. I will align query parameter dates to match DateTimeString format. This will make it even more specific and will also allow for timezone specification as that is part of that format. I will also fix the “after this date” language for end-date. That was a copy/paste error.

Ordering Agree to specifying date in descending order. I will update accordingly.

Running Balance The argument that this field is of minimal use with an API that can be filtered and paged is valid. I will remove it for now.

CR/DR The suggested language “a negative value indicates a reduction of the available balance on this account” will be included. Before adding a CR/DR field I would like to understand specifics around situations where a postive or negative sign on the amount field is not sufficient.

Account Category The use of account category on the filter for bulk APIs is deliberate as it matches the account list filtering. The idea being that I select a list of accounts and then bulk retrieve the balances or transactions for that filter. The actual category field is not included in the payload as it can be retrieved via the accountId and presumably is already (or can be) cached by the client by calling the account list API. The purpose of the payload is for transaction data, not account data.

Mandatory Reference OK. the reference field will be made mandatory.

Positive Integer OK. I'll change the PositiveInteger common type to exclude zero and add NaturalNumber as a common type. This will result in type changes across a number of API end points.

Pagination Page immutability was called out in the pagination proposal as not being required. This is at the discretion of the client to manage acknowledging that it will be a pain. The reality that transaction lists are changeability is simply a fact of life for this data set.

Bulk APIs The feedback that the bulk APIs are of limited use is, itself of limited use. The feedback here is primarily provided by organisations aligned with the banks. Feedback on whether these APIs are of use is more appropriately provided by the data consumer community. I think the feedback also fails to acknowledge that, if these APIs are not supported and the assumption of limited use fails, data consumers will end up calling the account specific end points more often than required. That leaves performance as a key concern which, while entirely valid, is better managed via NFRs rather than making the end points optional. Making end points optional will simply mean that they are unlikely to be implemented widely meaning data consumers are unlikely to utilise them in client implementations. Until there is more broad feedback from non-bank stakeholders these end points will remain in scope and mandatory.

Transaction Detail There is conflicting feedback here. On the one hand there is concern around performance and the sensitivity of the additional data available via transaction detail (which is mainly NPP related). On the other hand there is a suggestion to fold this API into the transaction list end point.

The purpose for keeping a separate end point was three fold:

As such, for the draft proposal, the transaction detail end point will remain.

NPP Specifics NPP specific notes:

Thanks all.

-JB-

JamesMBligh commented 5 years ago

The finalised decision for this topic has been endorsed. Please refer to the attached document. Decision 028 - Transaction Payloads.pdf

-JB-