ConsumerDataStandardsAustralia / standards-maintenance

This repository houses the interactions, consultations and work management to support the maintenance of baselined components of the Consumer Data Right API Standards and Information Security profile.
41 stars 9 forks source link

Changes to Periodic Polling Model #537

Open PayPalAustralia opened 1 year ago

PayPalAustralia commented 1 year ago

Description

According to the current Consumer Data Standards, Data Holders MUST react to Data Recipient and associated Software Statuses changes within 5 minutes of the change occurring on the CDR Register. To achieve this, Data Holders need to periodically poll the GetDataRecipientsStatus, GetSoftwareProductsStatus and GetDataRecipients APIs to retrieve the current statuses and cache these for use during requests for Consumer Data.

As Data Holders, after calling the CDR register and checking status of all Data Recipients registered by CDR, Data Holders are also required to action on the Data Recipients in the Data Holder authorisation server and subsequently withdraw consents, disable data sharing or deactivate the client when necessary.

The periodic polling model is system heavy and may not scale adequately within the 5-minute interval for large volume of Data Recipients as the API provides information of all Data Recipients, not only the ones the Data Holder is interested in.

Area Affected

Metadata Cache Management - GetDataRecipientsStatus and GetDataRecipients APIs

Change Proposed

PayPal considers that a better solution for performance purposes would be:

  1. Cache Refresh: CDR Register to notify status changes via cache refresh request (broadcasting message) or email. The cache refresh request at the endpoint was proposed in the previous versions. We propose to reinstate the proposal and implementation of that functionality, whereby the CDR Register upon suspending, revoking or surrendering a Data Recipient or removing a Software Product, notifies the Data Holders via an API call to refresh status. Subsequently the Data Holder refreshes cache and takes necessary action within the required timeframe.
  2. Relevant Data Recipient Status: We also suggest that the Data Holder has the flexibility to request information about the Data Recipients the Data Holder is interested in, instead of the full universe of Data Recipients.
CDR-API-Stream commented 1 year ago

Thanks @PayPalAustralia, there is provision for a Metadata Update API that Data Holders implement to receive a CDR Register initiated broadcast message. At present the ACCC has not implemented this trigger to broadcast ADR updates.

With regards to the current polling method, can you elaborate on the performance and load issues you are experiencing with respect to "system heavy" processing?

Additionally, are you able to describe the threshold of ADRs that will cause your cache to be stale and no longer scale. For example if it takes you 1 second to process an ADR's metadata, 5 minutes allows you to process 300 ADR records before being out of date.

PayPalAustralia commented 1 year ago

@CDR-API-Stream - PFB, our response to above comments.

"The current process to verify for periodic Data Recipient status is that each Data Holder makes an outbound request to get the status list across all Data Recipients registered. Then Data Holder needs to process the response based on the Data Recipients registered within themselves and update accordingly (e.g. registration status, tokens issues to the Data Recipient, etc..). This process is required to be performed every 5 minutes. The outbound call to retrieve the list of Data Recipients (milliseconds) and internal checks could take few seconds to minutes depending on the changes required.

Although there might be some time before the volume of Data Recipients rises to thousands, this model may not scale successfully with wider adoption of AU CDR. We suggest revisiting current model to avoid API performance pitfalls in future and that's why we propose to enable the broadcast Data Recipients updates functionality or alternatively allow Data Holders to request statuses for Data Recipients the Data Holder is interested in. "