Natixar / natixar-frontend

The static front end of the Natixar SaaS platform
0 stars 5 forks source link

Definition of a Category is Wrong #49

Closed lepeuvedic closed 4 weeks ago

lepeuvedic commented 1 month ago

Problems:

Categories form hierarchies. The top-level groups are called "Scopes" by BEGESv4 and GHG Protocol, but this term has been dropped in favor of more explicit descriptions in BEGESv5. Above scopes, we define a category whose name matches the protocol query parameter to the /data/ranges endpoint when the following transforms are applied to the display name:

First Problem:

Categories are defined as direct children of "Scopes".

https://github.com/Natixar/natixar-frontend/blob/71a70f216e807ac8413108ab93324a7049ac6fb8/src/components/natixarComponents/ScopeTable/NatixarExpandableRow.tsx#L72

Note that this code preexisted identically before the addition of by @astowny .

Categories are what emissions refer to, in the "data" part of the server response in the compressed data point index CDP_LAYOUT_CATEGORY. The difference is only important for GHG Protocol, because its "Scope 3" top-level group is divided in the "Upstream" and "Downstream" subgroups, and not directly in categories.

When a category is referenced, all its siblings (children of the same parent node in the categoryHierarchy) are also categories.

The algorithm has finished determining the categories when all the leaves of the category Hierarchy, which might themselves be subcategories, have exactly one parent meeting the criteria to be a category.

While its unrealistic in real operation, test cases can have no emissions in some category groups, like no downstream emissions because the data is not being collected. This situation would cause some leaves in the categoryHierarchy to have no associated category. To find the corresponding categories for these orphaned leaves, the front-end must use the following algorithm to continue:

The algorithm first descends the hierarchy from the node representing the protocol used by the already determined categories (the node at the top of GHG Protocol, BEGESv5 or BEGES branches in the category hierarchy). It explores the tree, stopping the search when it reaches a node already classified as a category (direct reference from data or sibling) and it collects the terminal nodes (nodes without children) it reaches without encountering a category. Along each branch, it collects the code field of the node, and overwrites any code with another one find at a lower level in the hierarchy. The exploration is in depth-first order, which means that only one code needs to be memorized at any time. Everytime a terminal node (leaf node) is reached, itself or its closest parent bearing the current code is immediately classified as a category, along with all its siblings. The exploration of the tree continues, with many branches pruned until all the leaf nodes are associated with a category.

Protocol Handling is Wrong

At the top of Climate Change category hierarchies, we have several branches corresponding to the various "protocols" according to which, the back-end can categorize emissions. The three currently defined (soon four with the addition of ISO) are very similar: BEGES, BEGESv5 and GHG Protocol.

First of all, the supported nomenclatures are hardcoded in an Enum EmissionProtocol. They are also hardcoded in EndpointEmissionProtocol type and in the formatProtocolForRangesEndpoint function.

The front end must be completely protocol-agnostic and data driven. We are already observing bugs striking BEGESv5 and GHG Protocol in particular, because the front make false assumptions instead of relying on data returned by the back-end.

A mechanism is available to dynamically detect the available category nomenclatures (also called "protocols"): when the front-end does not specify a protocol in particular, the server returns an HTTP response code 300 Multiple Choices, with a JSON object describing the various possible choices for the "protocol" query parameter and the "Accept" request header. In addition a Location header is predefined with the default protocol of the organization baked in the URI. In addition, even the server always returns all the protocols as immediate children of the Climate Change node in the categories hierarchy. The front-end has therefore several ways to learn the available categories nomenclatures.

The API Specification indicates the correct algorithm to derive the correct protocol parameter value from the text representation available in the categories. This algorithm must replace the switch based algorithm in the formatProtocolForRangesEndpoint function. The specified algorithm is:

Desired Behavior:

The front-end does not contain any hardcoded reference to any specific way of categorizing emissions. This include any hardcoded reference to scopes covered by issues #13 and #15 . The front-end learns how to use ISOTR 14069:2013 categories without any code change, once the backend is able to return ISO categories.

lepeuvedic-natixar commented 1 month ago

Modified formatProtocolForRangesEndpoint and commented out type EndpointEmissionProtocol since the list of taxonomies must not be limited to the three that were known during the initial development phase.

lepeuvedic commented 3 weeks ago

Postponing this one implies deactivation of GHG Protocol because scope 3 categories cannot be accessed and it's a major user-visible bug. I should have been able to deactivate it from the back-end, but since it's hardcoded in the front-end I cannot.

lepeuvedic commented 3 weeks ago

Short-term measure for categorie is to reindex the GHG Protocol in order to short-cut through the "Upstream" and "Downstream" subgroup levels. When filtering by "scope" in the scope filter (actually era filter) the blank "" era is treated like an "U" era (because such categories will contain supported activities, more likely to be upstream than downstream).