ConsumerDataStandardsAustralia / standards-maintenance

This repository houses the interactions, consultations and work management to support the maintenance of baselined components of the Consumer Data Right API Standards and Information Security profile.
41 stars 9 forks source link

Get Metrics V5 error metrics documentation #655

Open nils-work opened 3 months ago

nils-work commented 3 months ago

Description

To ensure the requirements are clear, the field descriptions in the ErrorMetricsV2 schema could indicate that errors are to be reported against each respective error code in the 4xx and 5xx series where the example additionalProperties and property1 and property2 fields currently appear.

Intention and Value of Change

Ensure compliant Get Metrics responses are provided to allow detailed analysis of ecosystem performance.

Area Affected

Get Metrics endpoint > ErrorMetricsV2 schema

Change Proposed

Make the following changes to the documentation only, to provide clarity of the existing requirement. No changes to the endpoint version or structure are proposed.

Name Description
»» additionalProperties Number of errors for a specific HTTP error code. Note that the property name must be 3 digits represent the HTTP error code the error is for
This is a placeholder field to be substituted with each respective HTTP error code in the 4xx and 5xx range recorded by the Data Holder. It is represented by property1 and property2 in the Non-normative Examples section. Note that the property name MUST be the three-digit HTTP error code as per the adjacent 500 example. All possible property names have not been defined as the range is expected to vary across participants. Examples would include, but are not limited to: 400, 401, 403, 404, 405, 406, 415, 422, 429, 500, 503, 504.
»» 500 Number of errors for HTTP error code 500. Note that this field is an example of a single entry due to the lack of OAS support [for the] JSON Schema patternProperties syntax. See the additionalProperties field in this schema for the generic property structure for error code counts
Reflecting the description provided in the adjacent additionalProperties field, this is an example demonstrating the structure for reporting the number of calls resulting in HTTP error code 500. Each error code recorded by the Data Holder in the 4xx and 5xx range MUST be provided in this format against the respective property name.

DSB Proposed Solution

The current DSB proposal for this issue is in https://github.com/ConsumerDataStandardsAustralia/standards-maintenance/issues/655#issuecomment-2484837011

perlboy commented 3 months ago

The published openapi specification does not specify these additional error codes and the reason provided isn't justification (all the codes can be listed and specified as optional). On this basis this is not a non-breaking change, will require an update to the API specification and an associated FDO.

cuctran-greatsouthernbank commented 1 month ago

Hi @nils-work, When we implemented ErrorMetricsV2, our interpretation was that we were required to report the server-side errors only. This was written in the Description for Aggregate error metric and by that, should also be applicable to the Unauthenticated and Authenticated error metrics.

If 4xx errors are required to be reported, then I suggest we also change the description of the Aggregate error metric. That would become a breaking change for GSB.

image

nils-work commented 1 month ago

Hi @cuctran-greatsouthernbank

The aggregate property is a continuation of the error reporting that was available prior to Get Metrics v4, which was expected to capture server-side (5xx) codes only. The earlier structure was retained as aggregate in v4 to facilitate the reporting transition for the ACCC. It is unrelated to the change proposed in this issue and will remain unaffected.

The breakdown by unauthenticated and authenticated, and per code (in the 4xx and 5xx range) in v4 was to provide greater insight.

The detail from the Decision that introduced these fields in v4 stated:

  • The ErrorMetrics model will be changed from a number to an object with the following:
    • Two fields containing objects named authenticated and unauthenticated to separate the errors for authenticated vs unauthenticated APIs
    • Each of these objects will contain objects per period containing a series of fields with the label of each field being a HTTP Status Code (e.g. 422, 500, etc) and the value being a number indicating the number of errors for the period

Guidance on error reporting, including some of this detail is available in this article - Errors.

nils-work commented 3 weeks ago

To accommodate the response above, feedback on the following options is being sought:

  1. No change to Get Metrics v5
    1. (Assumes the requirement to provide all error codes is clear.)
  2. Non-breaking change to Get Metrics v5. Options:
    1. Update the field descriptions to provide clarity on the requirement, with no FDO provided.
    2. Update the field descriptions to provide clarity on the requirement, with an FDO of Y25 #2: 2025-05-12
  3. Breaking change necessitating Get Metrics v6
    1. Update the schema to include the error codes defined in the Standards (excluding upstream specs) as mandatory fields. The additionalProperties capability will be retained to accommodate error codes associated with upstream specs or extensibility. Get Metrics v6 could have an FDO of Y25 #2: 2025-05-12
benkolera commented 3 weeks ago

Thanks for this.

Biza supports position 2.2 or 3.1. Option 2.2 feels like the best / most efficient win for the ecosystem without any undue immediate disruption to implementations. We encourage the changes mentioned in V6 when we have reason to add new metrics to the ecosystem and have a better reason to release a V6.

nils-work commented 3 days ago

The proposal is to change the field descriptions as per below. The authenticated section has two additional example error codes that are assumed to only be applicable to authenticated endpoints (401, 403), but all other detail is common.

This clarification will be applicable from the FDO Y25 #2: 2025-05-12. The Get Metrics version remains unchanged.

Unauthenticated section

Name Type Required Description
»» additionalProperties NaturalNumber optional This is a placeholder field to be substituted with each respective HTTP error code in the 4xx and 5xx range recorded by the Data Holder. It is represented by property1 and property2 in the Non-normative Examples section. Note that the property name MUST be the three-digit HTTP error code as per the adjacent 500 example. All possible property names have not been defined as the range is expected to vary across participants. Examples would include, but are not limited to: 400, 404, 405, 406, 415, 422, 429, 500, 503, 504.
»» 500 NaturalNumber optional Reflecting the description provided in the adjacent additionalProperties field, this is an example demonstrating the structure for reporting the number of calls resulting in HTTP error code 500. Each error code recorded by the Data Holder in the 4xx and 5xx range MUST be provided in this format against the respective property name.

Authenticated section

Name Type Required Description
»» additionalProperties NaturalNumber optional This is a placeholder field to be substituted with each respective HTTP error code in the 4xx and 5xx range recorded by the Data Holder. It is represented by property1 and property2 in the Non-normative Examples section. Note that the property name MUST be the three-digit HTTP error code as per the adjacent 500 example. All possible property names have not been defined as the range is expected to vary across participants. Examples would include, but are not limited to: 400, 401, 403, 404, 405, 406, 415, 422, 429, 500, 503, 504.
»» 500 NaturalNumber optional Reflecting the description provided in the adjacent additionalProperties field, this is an example demonstrating the structure for reporting the number of calls resulting in HTTP error code 500. Each error code recorded by the Data Holder in the 4xx and 5xx range MUST be provided in this format against the respective property name.
perlboy commented 3 days ago

Can the ACCC gives some guidance here? Historical experience has been a highly prescriptive metrics processing approach. This means that the optionality of additional error codes may not actually be true including if an metric is completely absent as it's count is 0 (429 would be a common one).

Clarifying that the Regulator will not seek to change schema compliant responses outside the specification, as has happened in the past, would be appreciated.