ADR Unavailability polling

anzbankau commented 4 years ago

ANZ is looking at the requirement around ADR unavailability and the requirement to Poll the ADR until they become available. We are looking for a couple of clarifications:

Do you have any guidance on polling frequency?
Is there a limit on how long we poll for? What if the ADR does not come back up? Can we stop polling after a period of time i.e. 48hours, 1 week etc ?

Reference to Standards: Expectation Data Holders will need to continually poll the ADR until recovery. Upon recovery, the authorisation withdrawal process will need to be executed as required

https://cdr-register.github.io/register/#accredited-data-recipient-unavailable

CDR-Register-Stream commented 4 years ago

@anzbankau thanks for your query.

We have raised this internally to get input. I appreciate the problem here. Without an end time, DHs would retrying indefinitely.

I'll update this thread once I've collated this input.

CDR-Register-Stream commented 4 years ago

@anzbankau to begin with, this is a hard requirement. ACCC's position is as follows:

Expectation Data Holders will need to continually poll the ADR until recovery. Upon recovery, the authorisation withdrawal process will need to be executed as required Proposed response:

Under rule 4.25(2)(b), data holders must notify data recipients of a withdrawal of authorisation in accordance with the data standards. The ACCC understands there will be updates to the data standards post-July to accommodate the processes for withdrawal of authorisation where an ADR’s revocation endpoint is unavailable. However, in the interim the ACCC considers data holders must make reasonable attempts to ensure that the withdrawal of authorisation notification is delivered. Having regard to the: • need to maintain consumer confidence and trust in the CDR; • CX inconsistency that may arise where the consumer has withdrawn an authorisation through the DH, that is not reflected on the ADR’s good or service; • need to ensure ADRs are able to meet any deletion or de-identification obligations that may arise from the withdrawal notification;

the ACCC considers retrying failed notification attempts for a seven day period with an exponential back-off pattern is reasonable.

We are using 7 days as a starting point, assuming that this is a significant period of time for an ADR to be down and the likelihood of them recovering from an outage in 7 days is high. Guidance will also need to be provided on expectations for ADRs post 7 days.

Exponential back-off will help reduce the number of retries the DH has to perform. The parameters of this pattern is being collated and a draft will be discussed on this thread before being formalised.

CDR-Register-Stream commented 4 years ago

The following approach to exponential back-off is being tabled for discussion. This pattern would be relevant for both ADR->DH and DH->ADR calling of the revocation endpoint.

Any input on this process can be captured on this issue. The intent here is to ensure that the participant is making a reasonable attempt to contact it's counterpart's revocation endpoint without a flood of retries:

Exponential back-off pseudocode for calling a participants revocation endpoint

Call the Revocation EndPoint
MAX_WAIT_TIME=3600000 (1 hr)
waitTime = 0
DO
   waitTime=   (2^retries * 100) milliseconds
   wait for waitTime
    status = Get Status of API call
    IF status = SUCCESS
        retry = false   
    ELSE
        Unable to get a response or not a valid response
        retry = true
    END IF

WHILE (retry AND (waitTime < MAX_WAIT_TIME))

This approach will result in the following call times

Retry Attempt	Wait Time (in milliseconds)
1	200
2	400
3	800
4	1600
5	3200
6	6400
7	12800
8	25600
9	51200
10	102400
11	204800
12	409600
13	819200
14	1638400
15	3276800
16	3600000

Maximum retry period

Particpant	Time
DH	7 days
ADR	indefinitely

DHs will be expected to retry calling the notification endpoint until success or 7 days has passed.

ADRs will have the ability to call the DH outage endpoint to identify when the DH will next be available. This can be incorporated into their retry approach. ADRs will also be expected to retry indefinitely or until the Data Holder is no longer part of the CDR

NationalAustraliaBank commented 4 years ago

NAB understands the importance on data holders notifying data recipients of a withdrawal of authorisation as outlined in the points above. However, we would like to point out the following points regarding the updated 7-day retry provision:

The 7-day retry requirement will require build effort on NAB’s part.
As build effort is required NAB would request that a fixed interval retry mechanism would be a simpler option. Therefore we are requesting this mechanism be included and request acceptable retry interval to be defined.
From November 2020, Data Holders and Data Recipients MUST implement an arrangement management endpoint that can be used to revoke an existing sharing arrangement.
NAB believe the risk of this issue is low and we are questioning the rationale to implement this 7-day retry mechanism change now given the expected changes to the CDS in November.
Should the ADR attempt to obtain consumer data after the consent is withdrawn, NAB will respond with the standard OIDC error, invalid_grant, as specified in RFC6749 (section 5.2 https://tools.ietf.org/html/rfc6749#section-5.2).

spikejump commented 4 years ago

Can @CDR-Register-Stream please clarify why is the max retry period for DH is 7 days but for ADRs it is indefinite? While retry indefinitely conceptually makes sense, technically is not ideal. What is the real outcome that needs to be addressed? Is it to ensure consent is revoked at ADR?

Can @CDR-Register-Stream also confirm that regardless whether DH can successfully revoke at DR that DH will mark the consent as revoked? And vice versa for DR?

If above is true, is it acceptable by ACCC that the dashboards at both ends may be out of sync (for a long time - with possibly ADRs have revoked a consent but bank say it is active)

If the eventual outcome is customer consent was revoke and no data sharing can occur may be there's no need to retry indefinitely - as long as sufficient attempt was carried out?

anzbankau commented 4 years ago

ANZ agrees with many of the points raised by @NationalAustraliaBank such as the impacts to existing build and relatively low risk of the current implementation. We would recommend the specifics on how this issue is to be addressed is included in future revisions of the registry design which would target the November release.

CDR-Register-Stream commented 4 years ago

@NationalAustraliaBank Thank you for your feedback. It would be beneficial to set a baseline here. Current documentation expects DHs to call ADRs until the notification is received. Providing the bounds of 7 days of retry and an exponential back-off mechanism is designed to help set expectations on what this implementation would look like. The build requirement would exist anyway, we are not trying to impose new requirements, instead help the interpretation of current requirements.

In terms of impact to build, I’m assuming any long term retry mechanism, where the parameters are in the order of days is the problem? My understanding is there are technical considerations to make as JWTs will expire and therefore cannot be retried after creation, beyond their relatively short lifetime.

As for the retry algorithm itself, my current thinking is that the exponential back-off approach outlined above is relatively straight forward to implement. To simplify this further there are potentially two options:

Define three tiers of retry period (short, medium and long) which could be built to. However I don’t see how this would actually end up being any easier to implement than the exponential back-off algorithm.
Have the client retry every minute until the 7 days has passed. This approach may be simpler but the whole intent of defining the exponential back-off algorithm was to reduce the number of retries being facilitated by the client and get that cost benefit balance right Neither of these options I feel have benefits which outweigh the usage of exponential back-off. If there are other angles of discussion here, please add them to the conversation.

@NationalAustraliaBank, @anzbankau . In terms of the November 2020 timeframe, yes we do have an opportunity to refine this consent withdrawal notification mechanism further however we need to provide technical guidance for the go-live timeframe.

CDR-Register-Stream commented 4 years ago

@spikejump , Thank you for your feedback.

To answer and comment on your questions:

Can @CDR-Register-Stream please clarify why is the max retry period for DH is 7 days but for ADRs it is indefinite? While retry indefinitely conceptually makes sense, technically is not ideal. What is the real outcome that needs to be addressed? Is it to ensure consent is revoked at ADR?

This usecase is different. If an ADR received a consent withdrawal request from a consumer and it doesn’t propagate to the relevant DH, then the consent agreement still exists from the Data Holder's perspective. This means that the ADR, if acting incorrectly, is still able to retrieve consumer data from the DH as the consent agreement hasn’t been synchronised with the Data Holder.

The DH also has NFRs that they need to abide by, ensuring they are not down for significant periods of time. DHs also expose an outage endpoint so the ADR can get insight into when the DH will be next available and therefore act accordingly.

Can @CDR-Register-Stream also confirm that regardless whether DH can successfully revoke at DR that DH will mark the consent as revoked? And vice versa for DR?

Yes, this would be required. The discussion here is how we can ensure this information is synchronised in a reliable manner. However, as outlined above, if the ADR does not call the revocation endpoint on the DH successfully and does subsequently request consumer data (which would not align to the rules but could technically be feasible) the DH would not know that the consent had been withdrawn and would facilitate that request for data.

If above is true, is it acceptable by ACCC that the dashboards at both ends may be out of sync (for a long time - with possibly ADRs have revoked a consent but bank say it is active)

If the eventual outcome is customer consent was revoke and no data sharing can occur may be there's no need to retry indefinitely - as long as sufficient attempt was carried out?

Is it acceptable? It’s an inevitability but something which we need to actively minimise/manage. Having this problem means the consumer is getting different messages from different participants.

spikejump commented 4 years ago

@CDR-Register-Stream Thanks for the response. Follow-up questions.

This usecase is different. If an ADR received a consent withdrawal request from a consumer and it doesn’t propagate to the relevant DH, then the consent agreement still exists from the Data Holder's perspective. This means that the ADR, if acting incorrectly, is still able to retrieve consumer data from the DH as the consent agreement hasn’t been synchronised with the Data Holder.

The DH also has NFRs that they need to abide by, ensuring they are not down for significant periods of time. DHs also expose an outage endpoint so the ADR can get insight into when the DH will be next available and therefore act accordingly.

Apologies I am not understanding how the above address the reason why ADRs are required to retry indefinitely, especially in light DH has a NFR to meet. The concern is not whether to retry or not but to retry indefinitely.

Given the concern that the ADRs may be able to retrieve data from DH after customer revoke in this error scenario, perhaps the requirement is for ADRs to immediately revoke refresh token locally regardless whether calling DH succeed or not. In the case of failure to communicate with DH the ADRs continue to retry for a reasonable period (not indefinite) to ensure best effort of informing DH of the revoke. This way, data sharing concern is removed and effort to keep dashboard consistency on both sides is the same.

Another possible approach is when ADRs failed to revoke consent with DH due to outage, the ADRs will know this and be able to inform the customers. It is then up to the customer to re-try revocation - and with possible suggestion for customer to revoke from DH's dashboard if continued failure. Will this be an acceptable solution? In this case, an indefinite retry by ADRs can be avoided as well.

Are any of these approaches acceptable?

commbankoss commented 4 years ago

Proposal 1 : July Timeframe We believe there is a simple way forward to close out this issue and ensure the least build effort and lowest complexity prior to go live. CBA does not believe that either ADR or DH should have to implement extended polling in the event of ADR,ADH outage for the July timeframe.

Our preferred approach For Extended ADR Outage for Go Live (July Timeframe):

There will be no retry polling from DH->ADR if ADR is down.
ADRs MAY call the introspection endpoint when a customer accesses the ADR Dashboard to ensure the correct view of consent status is shown. The alternative is to accept that after an ADR outage, the ADR dashboard may be out of sync until the next time the ADR attempts to use a now revoked Refresh Token.
ADRs will trigger data deletion / deidentification process upon
- Error as a result of attempting to use a revoked Refresh Token. Either through token endpoint or introspection endpoint.
CBA is broadly supportive of spikejumps proposed approach for dealing with extended DH outage

Pros:

Same approach is used for short (<7 days) and longer outages .. current approach only calls for polling for 7 days
No store and forward mechanisms required for go live

Cons:

Some ADRs may wish to do an extra check on load of dashboard to ensure consent status is in sync
May need to enhance error handling on call to token endpoint to handle RT having been revoked

Proposal 2 : November Timeline For November, we recommend the addition of these changes:

ADH to provide a new endpoint - bulk revoked arrangement api. This endpoint will return a list of revoked arrangements in the last X period (can be called every hour, for example)
ADR will not host a revocation API
ADR will call the bulk revoked arrangement API
- Periodically (say 1/hr)
- When a customer loads the ADR dashboard

After Extended ADR Outage:

After an outage, an ADR should call the bulk revoked arrangement api to understand which arrangements have been revoked while they were down.

Pro:

No store and forward mechanisms required. Lower complexity and less edge cases.
Removal of ADR revocation API removes need to ADRs to host authenticated APIs
- A better security posture for the ecosystem
- Less components for current and future ADRs to build and manage
Compatible with upcoming consent changes
Less components for current and future ADHs to build and manage as no need for store and forward

Cons:

ADRs may need to do an extra check on load of dashboard to ensure consent status is in sync
DHs need to host an additional API

Can we please request both DH and ADR feedback on our proposals in order to understand if there are shortcomings we haven't considered.

CDR-Register-Stream commented 4 years ago

@commbankoss Thank you for publishing your proposal. I'd also like to encourage DHs and ADRs to add their feedback to ensure we are covering all scenarios and have a full understanding of the associated impacts.

As for the proposal for the July timeframe, one of the requirements we need to satisfy is to ensure that ADRs are able to de-identify and remove data in accordance with the rules. This needs to occur in a timely manner. Without having robust notification from the DH to the ADR, we potentially introduce a delay into this activity where the ADR is not aware of their obligations until they next request consumer data on behalf of the consumer.

Therefore, removing the responsibility from the DH to retry requests to the ADR puts the ADR in a position where they can't guarantee they are satisfying the requirements of the rules. I don't see how this requirement can be satisfied without putting an onus on the DH.

perlboy commented 4 years ago

Ok, much of the content here seems to be going into the Data Standards space and it's happening on the Register API github. What this is doing right now is simply highlighting the flaws the single, non-enriched and brittle token<->consent solution introduces. It can't be said that the DSB wasn't warned of this because, ultimately, they were, numerous times from numerous international experts and various people and organisations including myself.

Consequently for the July timeline it actually seems like there is no solution other than the current, architecturally broken one that's in place with the only variable being some exponential back-off or a fixed "expiry" time (let's say 3 months) that is sufficiently long to deploy a fix and short enough that banks don't have to put in infinity into their queuing software. Ultimately code that is going to be immediately deprecated is going to need to be produced. This was entirely predictable and perhaps the DSB should have some mea-culpa here but the ACCC with it's army of lawyers in the mix is only going to make it worse.

For the November timeline though I still seriously question the "CDR Arrangement API" architecture because what ended up as the decision was never actually proposed for consultation and therefore no feedback was received. IF the DSB had asked for, or even listened to the initial feedback, of the specific CDR Arrangement API proposal (ie. after they posted a decision which described something no-one had seen) they would have received various feedback including that it is crossing security domains, has an undefined security surface, is brittle, requires infrastructure from a data recipient to be built for a single purpose and has zero vendor support. On this basis alone I currently personally discount the spec for November actually being the spec because it solves only 1 or 2 of the problems and creates a whole bunch more.

Fundamentally, if the access token (or in the non JWT case the introspection endpoint) described consents within claim data (as proposed within the Rich Authorisation Details submitted by the OIDF in it's submission to DP99 ) then all of this discussion would be entirely moot because an ADR would be notified by the Data Holder on the next token issuance (<5 mins for JWT) or via a required scheduled process of the ADR to introspect (non JWT) by the mere fact that the arrangement id would be absent.

Writing a rule that requires an ADR to setup this sort of process is simple, would be easy to understand technically and from what I understand would be a minor addition to an arduous accreditation process that asks endless non-technical questions and not that many technical ones.

Without having robust notification from the DH to the ADR, we potentially introduce a delay into this activity where the ADR is not aware of their obligations until they next request consumer data on behalf of the consumer.

There is nothing stopping the ACCC writing a rule or the DSB setting a Standard mandating that a Data Recipient must check the status of consents (using it's refresh tokens) on a scheduled basis (let's say, once per day). Realistically, with or without the consent withdrawal notification a Data Recipient should be contacting a Holder somewhat regularly anyway if only to heartbeat the refresh token (since it can be rotated before consent finishes) or to deliver their use case. If there is a use case that would involve a data recipient not attempting to at least check the status of a consent every day (or every week) I would assume that a data recipients data minimisation requirements would kick in anyway. If not then the accreditation components asking questions around this aren't worth the paper they are written on.

With all of this, I believe the specific part of the rules this relates to is:

4.25 Withdrawal of authorisation to disclose CDR data and notification (1) The CDR consumer who gave, to a data holder, an authorisation to disclose particular CDR data to an accredited person may withdraw the authorisation at any time: (a) by communicating the withdrawal to the data holder in writing; or (b) by using the data holder’s consumer dashboard.

(2) The data holder must: (a) if the withdrawal was in accordance with paragraph (1)(a)―give effect to the withdrawal as soon as practicable, and in any case within 2 business days after receiving the communication; and (b) in any case―notify the accredited person of the withdrawal in accordance with the data standards.

Note 1: Upon notification, an authorisation to disclose the CDR data expires: see paragraph 4.14(1)(b). Note 2: This subrule is a civil penalty provision (see rule 9.8).

4.25 (1) is a Data Holder concern and seems out of scope for this notification discussion. 4.25 (2) specifically states to align with the data standards. This forum is NOT the Data Standards and indeed this question seems to be ACCC Rules trying to solve a technical issue through an unrelated platform (Register) in the ecosystem. Indeed, this rule and it's apparent interpretation appears to be one confused by lawyers who fundamentally have no real understanding of how an operational and distributed technology architecture works.

Consequently I'm left wondering why it's being discussed by the ACCC in the first place and instead why the ACCC is not opening a question in the designated location for Standards development? Your own rules specify this is where the spec for the notification should come from.

CDR-Register-Stream commented 4 years ago

All, I have created issue #119 to track documentation updates to cater for DH retries using the exponential backoff algorithm. This leaves this ticket open to discuss the improvements to the model itself

perlboy commented 4 years ago

@CDR-Register-Stream We are talking about improvements to the model in the wrong repository on a thread to discuss polling... I feel this is confused, how do new participants attempt to provide comment to reach alignment? Is it worthwhile closing this thread and opening a discussion in standards-maintenance?

CDR-API-Stream commented 4 years ago

Given the nature of this thread has moved to a discussion regarding data standards clarifications and proposals for changes to the data standards, please post a new issue to Standards Maintenance and reference this issue.

If you could please state the issues being faced and any proposals that would be great. If the issue is considered urgent please request it be labelled accordingly.

NationalAustraliaBank commented 4 years ago

NAB does not support the proposal for the DH’s to implement extended polling to cater for the ADR outage scenario. Whilst the initial issue was raised to get clarification on the ADR polling frequency / duration by the DH’s, the discussion thread highlights some fundamental technical architecture questions around the current consent revocation approach.
NAB supports Proposal 2 (November Timeline) by CommBank and, in addition, would like the issue raised / discussed within the Standards Maintenance github for consideration of the broader technical / architecture implications highlighted within this thread.

CDR-Register-Stream commented 4 years ago

While we acknowledge there is benefit for further analysis on recommended behaviours for participants, both Data Holders and Data Recipients to better cater for this scenario, no work items are being planned for the July 2021 timeframe.

This issue will remain open to track analysis on this topic.

ConsumerDataStandardsAustralia / register

ADR Unavailability polling #82

Exponential back-off pseudocode for calling a participants revocation endpoint