ConsumerDataStandardsAustralia / standards-maintenance

This repository houses the interactions, consultations and work management to support the maintenance of baselined components of the Consumer Data Right API Standards and Information Security profile.
41 stars 9 forks source link

Remove BankingTransactionDetail and incorporate extendedData into BankingTransaction #636

Open jimbasiq opened 3 months ago

jimbasiq commented 3 months ago

Description

The extendedData object and attributes are so small compared to the BankingTransaction object and attributes it is of little value to separate and is causing a detrimental effect of many more API calls to DHs from ADRs.

Intention and Value of Change

For use cases that require a large number of transactions for each Consumer and also require an extendedData attribute, for instance the Payer or Payee. Rather than making a single Get Transactions For Account API call, the ADR is forced to make tens, hundreds or even thousands of API call per consumer for the initial data retrieval as calling the Get Transaction Detail API is required for each transaction. This adversely effects performance for both the ADR and DH.

Area Affected

BankingTransaction and BankingTransactionDetail objects.

Change Proposed

Removal of BankingTransactionDetail and incorporate extendedData into BankingTransaction.

nils-work commented 3 months ago

Related issue for reference - #229 - Service field in the Get Transaction Details API

markskript commented 3 months ago

We support this change. The only concern we have is what I imagine was the driving force behind the original design decision - ADRs being forced to pull more data from the DHs than they require. It would be even better if we had a way of defining a projection when we call that endpoint so we could explicitly say what fields we want returned. That could potentially solve both issues of reducing the number of API calls and allowing ADRs to only obtain the data they need.

perlboy commented 3 months ago

What use case requires the deliberate retrieval of full transaction details for all transactions for a 2+ year period? Further why does this use case need to be done live versus introducing asynchronous data retrieval?

The use of Transaction Detail provides an effective upper bound within the context of the NFRs for Data Holders for datasets which aren't stored in the same context as List Transactions. In at least one core banking system the retrieval of NPP details involves 6 cross-table joints per transaction. This problem will inevitably get worse with the introduction of more NPP (and SWIFT) message types. I note that the solution available now is analogous with the solution that was available before, i.e. in a screen-scraping scenario IB would be interrogated per transaction.

Perhaps the short term solution is to specifically request payer and payee be added to list transactions?

To @markskript's response, Yes, a projection of required fields would be a nice solution, we've had some thoughts here but there's some cascading requirements into the definition of an arrangement (and the need for a protocol to describe such things). The alternative would be to introduce a "List Bulk Transaction Details" endpoint that would be asynchronous in nature to account for the very real probability that some metadata could be a significant burden on internal systems.

jimbasiq commented 3 months ago

@perlboy there are many use cases (such as PFMs) that have a strong argument for requiring fields such as payer and payee.

Live or async, the number of API calls required is still the same.

My guess is the design with Transaction and Transaction Detail was made before the attributes that needed to be in each was agreed. We have ended up with attributes you could argue are as important as their Direct Entry equivalents such as apcaNumber being relegated to being "Detail". You seem to be suggesting NPP was late to the party so they get treated as a second class citizen compared to DE?

For a core banking system where the retrieval of NPP details involves 6 cross-table joints per transaction I'd suggest a caching layer is required.

I like @markskript's idea but being practical about what could be implemented in the near future I think this would be too much of a change.

perlboy commented 3 months ago

@perlboy there are many use cases (such as PFMs) that have a strong argument for requiring fields such as payer and payee.

En masse, preloaded detail information? What use case and with what pattern? It isn't PFM, at least not in the mass preload situation, on demand loading, sure, but huge sweeps for years back, no way.

Live or async, the number of API calls required is still the same.

Except async wouldn't run into NFRs and if there was a supported async method it would avoid the update storm we've observed from a number of Recipients often involving all transactions as far back as possible, between 2-4am, every day.

My guess is the design with Transaction and Transaction Detail was made before the attributes that needed to be in each was agreed. We have ended up with attributes you could argue are as important as their Direct Entry equivalents such as apcaNumber being relegated to being "Detail".

Actually, it's the opposite, the preference given by the banks at the time was for all those mechanism specific components be placed in Transaction Detail rather than pollute the Transaction List. There was much opposition to their inclusion but it was pushed through based on "feedback" received "anonymously". The result is that List Transactions is not only a sparse map of data that doesn't follow the principles but data is being retrieved that may never be used.

You seem to be suggesting NPP was late to the party so they get treated as a second class citizen compared to DE?

You seem to be suggesting Transaction Detail is a second class citizen. The inclusion of NPP details in Detail was because Banks (rightly) suggested such data is not easily available to systems already doing internet banking at the time. With the introduction of additional message types and potentially other rails, transaction detail is a well formed data structure. List Transactions is the outlier here not the norm and there's a reasonable question to ask why the extraneous elements in List Transactions shouldn't be moved to Get Transaction Detail.

For a core banking system where the retrieval of NPP details involves 6 cross-table joints per transaction I'd suggest a caching layer is required.

If banks followed that suggestion they would be non-compliant due to the obligation to align with IB (which often connects to core directly) and also be at risk of data correction requests. "Data caching" in the traditional sense doesn't actually work in CDR for a variety of reasons not least of which is the top feed behaviour of transactions and the query parameters available. I note that caching Get Transaction Detail is much easier although it still comes with a de-pairwising issue and the potential for negative ttl issues with respect to Data Corrections.

The only system that works is one that is consent aware and capable of preloading with updates driven by eventing - I should know cause it is a product we sell and it is closer to an ODS not a cache with a commensurate price tag that isn't really in the bounds of smaller organisations.

I like @markskript's idea but being practical about what could be implemented in the near future I think this would be too much of a change.

Frollo, a predominant PFM tool, has existed since inception and utilises transaction details. They have optimised their data retrieval for eventual consistency. I suggest other recipients do the same - as they are obligated to do.

JamesMBligh commented 3 months ago

Some historic context may be useful for this discussion as some of the original drivers for the current design may no longer exist.

When these endpoints and payloads were defined in late 2018 NPP was still new having only gone live in February 2018. Usage of NPP, compared to other mechanisms like BECS and DE, was still relatively low. At the time there was also still an expectation that many overlay services would be created over the top of NPP. These considerations all played into the development of the standard.

As a result the payloads were designed to:

As the detail was less widely used we put it on a separate API so it would only be retrieved by ADRs that actually had a need for it and knew what to do with it. This separation was also a recognition of the fact that everyone had only just implemented NPP and the transaction databases were, in the main, not connected to the rest of the infrastructure in the bank so get detail every time could have had significant impact on the fledgeling NPP infrastructure.

It is also worth noting that only NPP transactions were guaranteed to have a persistent ID which is why transactionId is optional unless detail is present.

Now, six years later, things are very different. BECS and DE and likely to be fully deprecated in the next few years, bank systems more fully integrate with NPP, NPP usage is significantly higher, overlay services didn't really eventuate and the Osko payload is pretty stable, everyone know understands NPP and wants the data.

In this context it is reasonable to propose that the two APIs be collapsed but it would be important to hear from the major banks and possible Cuscal, Indue and ASD (who service most of the smaller banks) as to what the impact would be on them.

If someone didn't want the detail it would be relatively easy to manage this through a query parameter (ie. include-detail=true).

If we were to pursue this then I would suggest that it should be done concurrently with the inclusion of a mechanism for the ability to request transaction data to be provided asynchronously via batch.

joshuanicholson commented 3 months ago

We support enhancements or changes to the current process for collecting Transaction Details.

We are okay with Jim's suggestion but are open to other ideas. We appreciate Stewart's comment on the complexities of a DH that pulls data together from multiple subsystems.

Stewart, while your comment about Frollo is accurate, it is hardly a fair example. Consider a (business) consumer with 100+ accounts and transaction volumes between 500 and 5000 per day (110k/month) and with a significant % of transactions with available "details". In many cases, this is fine as the math of looping through multiple sessions & the 100 calls per session will mean data is ultimately collected. But the traffic thresholds & NFRs do create an upper limit, and potentially the situation where the backlog of transaction details is never collected. It begs the question: Is forcing an ADR & DH to adopt this methodology of servicing so many sessions an appropriate solution?

The upper bound mentioned by Stewart becomes a hard limit that an ADR must be aware of. This means ADRs may need to inform large business consumers, "We will collect detailed information for all future transactions, but we are not doing it for more than xx days of historical data". Is this scenario a desired outcome of the CDR ecosystem?

The short-term suggestion of adding payee and payer is of some interest. However, other data elements in the detailed call are of interest, namely reference and extended description. It also must be noted that the requirement to use this extra (detail) call in some cases is primarily driven by a far from standardised implementation of the specifications from each DH. This means ADRs are picking up the task of piecing data together to meet the required business cases of consumers.

Alternative ideas (middle ground) we have discussed internally include a bulk transaction detail request rather than limiting it to one transaction per call.

To be blunt, we have clients holding off on using CDR based on data quality and these limits, as their non-CDR data collection processes are free of this issue.

perlboy commented 3 months ago

Stuart, while your comment about Frollo is accurate, it is hardly a fair example. Consider a (business) consumer with 100+ accounts and transaction volumes between 500 and 5000 per day (110k/month) and with a significant % of transactions with available "details". In many cases, this is fine as the math of looping through multiple sessions & the 100 calls per session will mean data is ultimately collected.

All this is accepted but my focus was on Frollo on the basis they are a long term participant in the ecosystem and they are an example of the PFM use case mentioned that is live now. If the desire is to go into what is currently a theoretical usage pattern in the CDR sure but I think it comes to the same conclusion - eventually there will be an upper bound where the question of reasonable synchronous response times collides unsustainably with cost.

The upper bound mentioned by Stuart becomes a hard limit that an ADR must be aware of. This means ADRs may need to inform large business consumers, "We will collect detailed information for all future transactions, but we are not doing it for more than xx days of historical data". Is this scenario a desired outcome of the CDR ecosystem?

Or they say "We will collect detailed information for all future transactions but it may take some time to retrieve the detailed backlog". What I'm unclear on is, in this situation, what is the use case? Are we talking about ERPs? I'm struggling to see why this business consumer needs every detail about every transaction outside of data, amount and narrative (which is essentially all that BankLink provides I believe)?

To be blunt, we have clients holding off on using CDR based on data quality and these limits, as their non-CDR data collection processes are free of this issue.

I'm always dubious about this type of statement cause it's quite reductive. I accept it's a bit of a chicken and egg problem but without introducing demand it's challenging defending implementation costs. Nonetheless back on track, restating what I originally posted:

The alternative would be to introduce a "List Bulk Transaction Details" endpoint that would be asynchronous in nature to account for the very real probability that some metadata could be a significant burden on internal systems.

I'd suggest this endpoint, if available with Transaction Detail and List Transactions remaining, would mean:

  1. a reduction of the attributes in List Transactions could be considered a positive improvement
  2. a restructure of the attributes to be broken up into uType values so that Transaction Detail lends itself more readily to extensions

Further, to the suggestion of a query parameter instead of a new endpoint, this is probably a bit problematic in the context of the existing NFR structure because now it's either a quite cheap vs. very expensive pivot based on the parameter.

jimbasiq commented 3 months ago

@JamesMBligh Thanks for the historic context and I believe support for the proposal that the two APIs be collapsed. I agree, it would be great to hear the opinion of some other Data Holders or their service providers.

@joshuanicholson Thanks for your support, it is good to hear we are not the only ADR providing Data Recipient services who is encountering these limitations and has the issue of customers not wishing to move from existing data retrieval methods to CDR OB. A bulk transaction detail request could be a back up option but seems to me to be a workaround.

@perlboy Thank you for the lively debate. I can't see a strong enough reason not to make this change. Bringing the discussion back to facts:

I'll look forward to an online debate in a MI session. Unfortunately I can't make the next one so please don't have it without me :)

DougFromPayPal commented 2 months ago

@jimbasiq - I can add some perspective, at least as the person responsible for developing the global Open Banking data outbound APIs. I have to craft my words wisely as to not convey the wrong message, while attempting to add value to the conversation. What concerns me the most when I read comments that treat these changes as if they are easy because they are 'minor'. When it comes to a very large corporate data holder, there is rarely a 'minor' change to our APIs or functionality. So, when I read a change request like this one, I see that the functionality and fields already exist, however, it is not convenient or efficient for some ADRs. The CR is merely pushing new development efforts and costs unto the data holders just to relieve some inefficiencies in some ADRs without providing new features/functionality or value to the ecosystem. I apologize if I am missing the finer points of this CR, but as it stands now, PayPal does not support it.

markskript commented 2 months ago

it is not convenient or efficient for some ADRs

This is not the only perspective being considered here. We have feedback from DHs that some ADRs are making too many calls too frequently, and that the rate limits in the NFRs are set too high. This change would reduce the number of requests ADRs have to make to DHs, helping alleviate some of the pain we are seeing reported from DHs.

DougFromPayPal commented 2 months ago

it is not convenient or efficient for some ADRs

This is not the only perspective being considered here. We have feedback from DHs that some ADRs are making too many calls too frequently, and that the rate limits in the NFRs are set too high. This change would reduce the number of requests ADRs have to make to DHs, helping alleviate some of the pain we are seeing reported from DHs.

Thanks Mark - Luckily we are not seeing a traffic problems for this particular issue (yet) and I appreciate the insight as well!

perlboy commented 2 months ago

This is not the only perspective being considered here. We have feedback from DHs that some ADRs are making too many calls too frequently, and that the rate limits in the NFRs are set too high. This change would reduce the number of requests ADRs have to make to DHs, helping alleviate some of the pain we are seeing reported from DHs.

To clarify, this complaint at least in part came from me based on private feedback and it related specifically to excessive List Transactions calls so increasing the attribute count actually makes a bad problem worse.

JamesMBligh commented 2 months ago

It's going to be hard to find consensus between the obvious utility of collapsed API and the equally obvious concerns about API load, especially via the MI change path.

I believe that the NFR Consultative Group has been discussing ways of dealing with high volume use cases for usage in energy and that an asynchronous model could be a path to addressing that.

If that results in a proposal then perhaps this problem can be solved via the same path. Specifically, I mean an asynchronous mechanism for transactions that could replace existing file based data sharing mechanisms that also includes extended data for each transaction that has it.

@joshuanicholson, would this address your client's concerns? They are most likely using file based mechanisms right now I imagine. If they're using screen scraping then they probably won't be getting the extended data from all banks anyway as it isn't always shared in internet banking.

jimbasiq commented 2 months ago

Hi All,

Thank you for the good discussion in the MI, I feel like we are working towards consensus on the best approach.

First of all some numbers. In the last 30 days we had 3,664,132 transactions retrieved in to our platform either from initial arrangement data retrieval or connection refreshes (which we only retrieve new transactions for an existing arrangement) that had a isDetailAvailable value of true on the BankingTransaction object. i.e. if we were retrieving the additional data we would have made 3,664,132 additional API calls.

Secondly, as discussed the proposal is for ADRs to define which BankingTransactionDetail attributes are of high value. We can then ask Data Holders to flag any of these which will be high effort to include in BankingTransaction.

For Basiq, the highest value attributes in BankingTransactionDetail will be:

"payer": "string", "payee": "string"

Secondary value would be:

"x2p101Payload": {"extendedDescription": "string"}

Other ADRs, please add your attribute votes and any metrics you have 👇

nils-work commented 1 month ago

While discussions relating to NFRs are underway, the detail below simply aims to summarise the options described in previous comments, with some additional questions/considerations.

Options discussed:

  1. Current state: Make an additional request for each transaction that has detail associated (ref: 1).
    Or, to avoid making additional requests;
  2. Add payer and payee fields to the existing Get Transactions For Account endpoint (ref: 1)
  3. Add payer, payee and extendedDescription fields to the existing Get Transactions For Account endpoint (ref: 1)
  4. Add any additional fields (or exclude some existing fields), and enable them with a querystring on the existing Get Transactions For Account endpoint (ref: 1, 2)
  5. Define a new bulk detail async endpoint with any key fields specified (ref: 1, 2, 3)

Other questions/considerations:

josh3n commented 1 month ago

Warning: Long post

Sorry, this analysis has taken some time to assemble over a few weekends and late nights. I want you to know that all data included here is my transactional data from my accounts and my money. It should also be noted that this is a comment from an individual consumer with enough experience and knowledge of CDR & Banking to be dangerous but has lots of accounting and tax compliance experience. Sure, a few minor personal details are disclosed within, but this is done to improve CDR and consumer experience & ensure it is fit for purpose. And to say it upfront so there are no misconceptions, sorry, but the brutal truth is that many data holders are not complying; data quality is not fit for purpose. Sadly, traditional bank statements have a much higher data quality than CDR data (is that acceptable?). Many DHs provide excellent data; kudos to those few, and more importantly, so many DHs are so so very close; I hope it's not too much to uplift the quality to excellent.

This comment and the documentation supplied are not to propose a solution but rather to provide an extensive dive into actual examples and areas of improvement. I've also attempted to distinguish the critical and nice to have. The primary business cases I have in mind are my personal background in accounting, business administration, superannuation, financial planning, investment management and tax compliance. However, I wonder if other cases, like financial counselling, mortgage monitoring, credit assessment, etc., are vastly different regarding data quality.

The attached spreadsheet is focused on NPP transactions for two reasons: it's probably the most 'rich' transaction data source we have + most likely to have available transaction details. I have started analysis of other transaction types (card, bpay, direct dr/cr etc.) – but more on that with another discussion. One of the reasons I am sharing this is to show that ADRs are struggling to piece together data as the implementation of the specifications is not standardised. This means ADRs are building bespoke solutions for each DH to extract the required details. This also means an ADR solution becomes brittle whenever a DH decides to make a change, solutions break and consumers are left without data. ADRs are given no advance notice of a change, we regularly start work on a Monday with failures after DH 'upgraded' a system over the weekend.

Points about the spreadsheet

• This is a sample of 22 Banks – I am sure some DHs will be able to identify their transaction, but I am not disclosing any DH specifically • There are two transactions provided per DH, 1 Deposit, 1 Payment • All transactions are of the NPP, Osko, and Instant variety, depending on the terminology of each DH. Many of them were performed using a PayID • There are columns for each key data field of the transaction and transaction detail call. JSON data elements of no real value are excluded. For example, Merchant Name is not really relevant for NPP, but don't get me started when it comes to card transactions! • There is a column of my 'quality ranking' in an attempt to determine how good the standard description is • There is a column 'F4P', i.e., if an ADR looked at all the supplied data (tran + tran detail), could it construct a transaction description that was fit for the purpose. This is a strange idea but it is essential as all the various pieces of data need to be jammed into a single field to describe a transaction so it can be coded, categorised and reported. Some of these accounting systems do not have the design or capability to break the data into individual elements (this is slowly changing). Also, consider so much bookkeeping, admin and accounting is performed off bank statements and/or CSV-like transactions exports (date, description & amount) • Red cells are not good, and me not being happy with the supplied data • Green cells are me being UBER happy • Yellow cells are questionable data, not terrible, not always wrong, but just not correct or logical • Notes are added to cells where data is just wrong, not credible, not compliant and most commonly not in line with the DH's digital experience

Transaction Description Utopia This is my attempt to construct the perfect (NPP) transaction description from various bits of known data. Sure, this is idealism, and sure, some people would vary this, but it is in line with my business case and KPI target to measure quality.

Elements of description: [Tran Type] [to/from] [Name/Contact/PayID/Account] [DH reference/confirmation] [Extended Description] [EndtoEnd/Reference]

Examples of the above: NPP Deposit from J R Nicholson Receipt D98765 Timber invoices Ref 24004625 Deposit from 012012-xxx123 D98765 Timber invoices Ref 24004625 NPP Payment to ABC Timbers P/L Receipt D98765 Timber Invoices Ref 24004625 NPP to accounts@abctimber.com D98765 Timber Invoices Ref 24004625

Based on the analysis, this is a list of issues found.

  1. Lack of implementation of the transaction detail call
  2. DH digital experience has details of transactions, but they are NOT included in the transaction & transaction detail call
  3. Use of 'payment' transaction type for incoming deposits to an account
  4. Use of consumer' payee or contact name or payid' rather than a name from other systems (NPP, etc.) - it could be an idea to have two fields
  5. Poor mapping of data from DH to JSON data elements
  6. Improper masking of references
  7. Merging of various data elements into a single field
  8. No permeance to transaction description; for example, a consumer changing a payee name updates all transactions associated with that payee
  9. Lack of execution date + time for NPP transactions, mainly where DH digital experience does
  10. Instant transactions like NPP remain in a 'PENDING' status for five days, even though funds are cleared and usable for the consumer
  11. The posted date of a transaction is for a future date/time
  12. Use of 'not provided' or lots of spaces when there is no data; rather than the standard of not including data element, leaving it empty or use of 'NULL' (as an ADR, we prefer null)
  13. Transaction Type mismatch between Transaction & Transaction detail calls
  14. Mixing up Execution & Posted date/time; that is, Execution being reported as Posted and vice versa
  15. Execution date/time reported as text in the description but not in the proper data field
  16. Different data experiences for NPP deposits & withdrawals from the same Bank

So this leads to my list of things to be done to resolve this (some of these are compliance enforcement issues)

  1. If data is available and transaction detail is not populated, please comply and populate it

  2. Mapping data to the correct place, reference should be in reference element, end to end = NPP reference, extended descriptions should not be a copy of the description, etc.

  3. DH should be matching their digital experience; this would solve many of the issues; so much data is available via mobile and browser experience but not via CDR – sorry, not acceptable, not compliant

  4. Construction of transaction descriptions: How are bank statement descriptions constructed? While not matching digital experience, why is there such a variance for so many DHs? Do we need to mandate a specific recipe that DH must follow for a transaction description?

  5. Ideally based on the above, I'd like to consider some of the following ideas a. Moving all data from detail to transaction, if we had to prioritise Payee & payer endtoendID extended description b. Bulk transaction detail calls (e.g., multiple IDs in a single post)—I am happy for allowances to be made around the performance of these calls, as I can only imagine how many joins DHs have in their databases. Could ADRs be limited to only using these calls outside peak hours? c. removing any CPS limits for the transaction detail call, if an ADR needs to collect 18 months of data for an account with 200 transactions per day and 30% including detail, the volume will explode quickly. d. More ideas? I know a couple have already been proposed.

  6. Data masking must immediately stop, specifically system-generated reference numbers. It is okay to mask text the consumer has input if it matches obvious PII data, like the Luhn algorithm.

  7. If data can be populated as text in a description, then it should also populate the specific data element provided in the specifications (for example, reference, execution dates, end2endid, etc.).

Nice to haves (not all related to NPP transaction – and maybe I need to create separate items for some of them)

  1. Transaction Types: The list we have via CDR seems limited, and many DHs need to pay attention or contort themselves into an overly simplified list of types. Is expanding the transaction types into a more granular and standardised improvement? For example, BAI2/BTRS and ISO20022 are long-term accepted lists. If a DH has it, provide it, otherwise squeeze
  2. Card Number: for accounts like credit cards that have multiple cards, a new data element for the card number (masked)
  3. For rich transactions like NPP breaking out data like PAYID, Registered Name v's the name the consumer uses to save as the Payee
  4. Some DHs are really good at this, but any confirmation number, such as a transaction reference, MUST be made available. There is no such thing as too many references in relation to integrated solutions; for example, references from merchants and payment solutions mean bank transactions can be matched to data feeds from those other systems. (Microsoft is doing some cool stuff with Copilot in this space)

CDR Data Quality - Transaction Detail - NPP.xlsx

jimbasiq commented 2 weeks ago

A very short post from me :) In order to retrieve extendedData when it is flagged as available and the data is in scope for a consent arrangement, 3,664,132 additional calls would have been made from Basiq to Data Holders in the past 30 days.