jimbasiq commented 3 months ago

Description

Some recent planned outages from data holders have been days in duration. This is damaging to the success of the CDR, it is eroding trust and driving data recipients to alternative data sources such as screen scraping.

Intention and Value of Change

We need data recipients and consumers to be able to trust in the stability of the CDR framework. We need data holders to carefully consider the impact to their customers when planning outages to CDR services.

Area Affected

Availability Requirements

Change Proposed

The proposal is to make the Availability Requirements NFRs binding and applicable for planned and unplanned outages - https://consumerdatastandardsaustralia.github.io/standards/?diff#availability-requirements

The proposed change is to make the 99.5% up time target a MUST for planned and unplanned outages so that there is a clear definition of what is expected of Data Holders.

Any planned outage that will exceet this parameter should be discussed with the ACCC as an exception.

JamesMBligh commented 2 months ago

I would like to voice support for this change. Since this issue was raised two weeks ago an energy retailer has published a 14 day planned outage. Clearly the current language is being taken advantage of.

It would be better to have the exclusion for planned outages to be removed from the definition of availability and then, if there is a real, justifiable, need for a prolonged outage a holder can discuss the need with the ACCC ahead of time to avoid any regulatory action.

markskript commented 1 month ago

Skript would like to strongly support this change. In the past couple of months, one of the big four data holders has scheduled outages during business hours, severely affecting consumers' ability to operate as they rely on fresh data from the CDR to operate their day-to-day business operations.

To support this change, as of the 25/9/24 there are scheduled outages being published by DHs that clash with what we would deem normal operating hours (especially when taking the West-coast consumers into account)

One of the Big4 - a Tuesday 8pm-10pm AEST scheduled outage
Another Big4 - a full 10 hour scheduled outage starting at 8pm on a Thursday

perlboy commented 1 month ago

Agree in principle here although I wonder if it would be better to tie the CDR uptime to that of the primary digital channel. It seems reasonable to accept if a banks internet banking is down CDR probably can be as well. 2-4 hour scheduled outages on internet banking are still relatively common and if a Holder wants to decide their customers will tolerate it having CDR availability tied to that makes it no better or worse.

joshuanicholson commented 1 month ago

We want to add our support to this change.

We want to add some scope creep to this or a new CR around DH's reporting upcoming changes to their API endpoints/responses, which they believe MAY add a breaking change for an ADR. An example of this could be (but is not limited to)

including transactions id's to the transaction call where they were previously not disclosed
changing the format of how a date/time is reported
making the transaction detail call available where it was previously not available

Our worst fear as an ADR is for a change to be released that breaks our data collection or causes some form of data quality issue. While we appreciate the release of "fixes", any form of notice allows us to prepare and support consumers through the fixes.

CDR-API-Stream commented 3 weeks ago

In the MI meeting held on the 16th of October, several technical solutions were discussed. Data Holders indicated that reasons for some recently reported outages related to the data that would be shared was not "accurate, up to date and complete" according to Privacy Safeguard 11 (refer 56EN Privacy safeguard 11—quality of CDR data of the CCA 2010).

Putting to the side whether it is appropriate for a Data Holder to restrict or limit their CDR service in such situations, valuable discussion focused on mechanisms to better communicate the breadth of an outage to ADRs and limit what data is not shared in these scenarios.

The options discussed are not mutually exclusive and can each individually solve different problems but also come with different implementation considerations.

Options discussed:

Option 1: Provide more detailed structured outage information

The problem being solved with this option is to provide Data Holders to provide a more structured machine-readable outage information to appropriately convey the nature of the outage being planned. This lets ADRs know more specifically what data or endpoints will be unavailable and communicate this with with their customers / take appropriate action to their data collection processes.

In this option, technical outage information would extend the Get Outages API to include:

a technical explanation,
the list of affected endpoints,
possibly, the list of affected data clusters,
possibly, the list of affected products or product categories, possibly using productIds and product category ENUMs,
a coded list of Data Quality and PS11 impacts

At present only a customer-facing outage notification is provided via the Get Outages endpoint for scheduled outages. It has been observed that not all Data Holders provide a consumer-friendly error message. Providing a technical explanation would assist ADRs and data consumers to better understand the technical impacts and take any relevant action in response to a scheduled outage.

It was noted that providing the list of data clusters or the full list of product categories may be technically challenging if the Data Holder has a large list of products and some current outage management systems would need to change to accomodate this.

Option 2: Allow DHs to deny data sharing at the account level

The problem being solved with this option is to limit the breadth of the outage and maximise system uptime for more consumers and more accounts.

In this option, Data Holders can limit the outage to only specific accounts known to be affected by data latency issues.

The Data Holder would return a (Unavailable Banking Account / Unavailable Energy Account / Unavailable Service Point / Unavailable Resource) error for account resources that are impacted. All other accounts and their data would continue to be shared.

Option 3: Attach additional metadata for impacted transactions or accounts

The problem being solved with this option is to limit the breadth of the outage and maximise system uptime for more consumers and more accounts.

In this option, Data Holders would return additional metadata in the meta{} object for transactions and accounts where data latency is an issue. This meta could include a set of data quality flags (or ENUMs) denoting what issues are affecting the account or transaction data.

Option 4: Provide additional consumer notifications via the ADR, DH or both

The problem being solved with this option is to inform consumers of known impacts to data collection.

In this option, CX Standards would define when ADRs or Data Holders would notify consumers of outages that impact them. It was noted that banks commonly provide SMS notifications and messages provided as banners within their digital banking channels informing customers of scheduled maintenance and outages or data latency problems affecting their banking service.

This option may be improved by considering Option 2 or Option 3 that allows an ADR to notify the consumer when they are in a Customer Present scenario. Otherwise, the ADR could determine whether notification

Further to these options, the Shared Signals Framework was also discussed as a possible improvement to provide better secure event notifications between Data Holders and ADRs to notify when there is an issue, or an issue like an outage has resolved. This could be used for unplanned outages as well as other use cases such as consent revocation and additional events that occur in the system.

markskript commented 3 weeks ago

Skript supports Option 1, in combination with a NFR change relating to the percentage uptime as requested in the original ticket description. The other options we would question the benefit-vs-effort comparison.

ConsumerDataStandardsAustralia / standards-maintenance

Revise the Availability Requirements NFRs #660

Description

Intention and Value of Change

Area Affected

Change Proposed