Project-OMOTES / architecture-documentation

0 stars 0 forks source link

WIP: Propagating informational & error messages from workers to frontend #38

Open lfse-slafleur opened 1 month ago

lfse-slafleur commented 1 month ago

Staggered approach. First we are going to focus on transporting messages transparantely from the worker to the frontend without any type of parsing by OMOTES. There is currently not enough support nor value in providing informational messages through codes. However, we may introduce codes at a later stage if it becomes a requirement.

New plan detailed below. Left this starting comment as is for when we will continue the work using codes.

Issue description

The worker expresses multiple types of information that is useful to show to a user of the frontend regarding their submitted job. The following types of information have been gathered:

Currently errors are caught and the logs of a job are forwarded to the frontend. However, this is not a user-friendly way of showing any type of errors. Mostly because it doesn't only show the error but also all kinds of debug information. While this debug information is useful for developers, a user is unable to currently interpret it. Therefore, we should provide a way for user-friendly error messages to be propagated from the worker to the frontend.

Also, in the near future, we expect that the workers are able to supply additional information to a user regarding the output ESDL. This is also referred to as 'Asset Feedback'. Specifically MapEditor already supports a method for showing informational messages tied to a specific asset and MapEditor requires a specific format to be used for these informational messages.

Origin of messages

While the worker will perform the job, the orchestrator may also introduce informational messages. Specifically if a job timeout or delivery limit was reached the orchestrator will provide the necessary error message.

Solution direction

Displaying the informational message(s) at the frontend

Each frontend will have their own interface on how to display the messages. Specifically localization (language in which the message is shown) may be different depending on the frontend and the user. A frontend may want to:

In order to fit all the requirement informational messages will not use a message in a single language, but rather a message code. Currently we expect 3 different message codes for errors from the optimizer, around 3 error codes from the orchestrator and an unknown amount from the simulator. This list will grow in the future. This code will be accompanied with a number of key-value pairs that make up the dynamic parts of the error message. Example: 'The heatdemand in could not be met by wh' where asset_id and number_of_wh are dynamic values.

Identifiers

The identifiers to use for messages codes will be of string type. This allows a message code to be descriptive and human-readable without the corresponding message. This leads to more readable source code, easier to understand errors during system administration as well as reduce the amount of bugs if messages codes would be only an integer (easy to make a simple typing mistake).

Severity

The proposal is to use the same levels for severity as the logging module follow:

Categories

The proposal is to prefix message codes with a category based on the origin of the message. An example list:

This allows message codes to be divided across the origin component of the message and prevents overlap. This also allows OMOTES to use the same message code identifier as the component underneath uses by just prefixing a category.

Dynamic values in informational message

A message may reference a message code to denote the meaning, but to convert the message into a language e.g. English or Dutch the message may contain a number of dynamic values. For instance a message e.g. The peak of heatdemand for asset {{asset_id}} exceeds the total heatproduction by {{exceeded_by_kwh}} kwh of thermal energy. contains 2 dynamic values: asset_id and exceeded_by_kwh. Therefore, messages codes may be accompanied by a number of key-value pairs to further describe the message.

Datastructure

An informational message may pertain to a ESDL or a specific asset. It has a severity, a message code and a number of key-value pairs for dynamic values. The following datastructure (within OMOTES) is proposed:

Documentation & default template

We should extend the wiki with a 'message code' page that describes what each message code means, the expected dynamic value keys, the expected severity and what type of ESDL asset the message code may reference. We should also maintain a file to with default messages for each message code in English.

Support

The perspective of 'providing support' is essential for this feature. The informational messages may be interpreted solely by the user and/or also by a support agent and/or developer. As such, the informational message is targeting multiple readers at once which each have different use cases for the message:

Users interact with a frontend and how this frontend works is not defined from the scope of OMOTES. Therefore, how to proceed when a specific message is provided should be defined by the frontend. This ties into the requirement for a frontend to be able to define the text which is shown to the user based on a message code and dynamic values.

Architectural division of responsibilities

Proposed extra actions:

Example walkthrough at each component in case of an exception at worker

  1. Lets assume a job is submitted for an optimization job and the optimizer-worker is currently working on it. However, the temperature in the pipes fall below the operation threshold of 30C (I am making this up here, no clue if this is a realistic example) and the optimizer throws a MesidoException.
  2. The MesidoException is caught by the optimizer-worker. This MesidoException contains the Mesido-specific message code 'error_supply_temp_too_low' with dynamic values { "asset_id": "pipe1", "operating_threshold": 30.0, "supply_low_point": 25C, "when": "2024-04-01T01:00:00" }. The optimizer-worker recognizes this exception and converts it to a JobResult with the informational message { "severity": "ERROR", "message_code": "optimizer_error_supply_temp_too_low", "dynamic_values": <same as MesidoException>, "technical_message": "<The str(MesidoException)>", "asset_id": "pipe1" }
  3. This JobResult is forwarded to the orchestrator with ResultType.ERROR.
  4. The orchestrator processes that this job has finished and forwards the JobResult to the frontend SDK.
  5. The SDK passes the JobResult to the OMOTES <-> frontend conversion layer (in case of MapEditor, omotes-rest-mapeditor and in case of NWN-DTK frontend, this is just part of the webapp).
  6. Assuming MapEditor is the frontend, the informational messages are converted to MapEditor specific asset feedback messages. Any DEBUG messages are removed and any messages without an asset_id ?are handled in a different way? (see question below)
  7. The user of MapEditor sees the job has finished and is displayed the informational messages and the job result.

Arguments to not use codes but rather forward the messages as is

Currently we propose to capture messages from the optimizer/worker and convert them into standardized codes due to the arguments in the previous sections. However, @MarkTNO has made some valid points and voiced some concerns on why we shouldn't use codes but rather forward the messages as-is:

There are trade-offs here that we will want to navigate in the architecture team.

Work due to proposal

Cloud team

Optimizer team

Simulator team

TPG team

MapEditor team

Remaining questions

edwinmatthijssen commented 1 month ago

I definitely want to extend the possibilities on this topic (handling model feedback, logging, progress reporting, and so on) in the ESDL MapEditor as well, so for asset-feedback there is no work (or hardly any work), but in general this statement is not valid.

MarkTNO commented 1 month ago

Too add to Arguments to not use codes but rather forward the messages as is: I would indeed prefer not to parse messages and create message codes in omotes. I think this should be done at mesido/simulator code. They have an overview of the different types of messages and keep track of a list. It would be impractical for omotes to do this and require omotes updates whenever a new message type is created.

I know that the mesido team has decided not to use message codes, which I prefer. The mapeditor will directly display the technical message. TPG might want do a conversion by displaying a Dutch text asking to contact customer services along with the technical message for reference, or preform a regex operation (such as is currently suggested in a general form in the omotes workers) specific for their own needs.

This will mean that:

Regarding the Remaining questions: I think is useful to have general feedback messages (also as an option in the ESDL validator). We have to discuss the best way for displaying these and implement.

edwinmatthijssen commented 1 month ago

True.... adding non-asset specific feedback to the MapEditor is easy and good to add!

cwang39403 commented 1 month ago

I would prefer to forward the messages as they are from a maintenance perspective.

For the time being, if we constantly receiving similar support requests from the users, I would look into the direction of having a Q&A section on the (architecture) documentation page that lists a few commonly seen technical/error messages to help users with the next steps.

lfse-slafleur commented 1 month ago

Hea Cheng-Kai!

I would prefer to forward the messages as they are from a maintenance perspective.

* This proposal _could_ be a nice addition to the frontend user to navigate when errors occur, but assuming most users using OMOTES are likely to have a certain technical background and probably also going through relevant training already, the additional layer of adding error codes and parsing technical message does not add that much value at this moment in my opinion.

From MapEditor perspective it doesn't add much but for frontends e.g. NWN-DTK by TPG it is a must to have some form of standardization so they can attach both user-friendly messages and also provide the option to have technical messages available.

* I am more concerned about the maintenance of this additional layer as this requires inputs from different teams. Now it is still easy to find each other during the active development phase, but it is hard to guarantee in the longer term. Also, any change request could introduce changes in different dependent components.

Definitely a concern and it seems this is also the core of Mark's worries on how to proceed here.

For the time being, if we constantly receiving similar support requests from the users, I would look into the direction of having a Q&A section on the (architecture) documentation page that lists a few commonly seen technical/error messages to help users with the next steps.

Definitely good to add! Currently they are not receiving any technical/error messages so lets provide both some form of message and also a Q&A section.

Going to reply with another message shortly to alter the proposal to one that seems to have the most support currently.

lfse-slafleur commented 1 month ago

Currently it is a step too far to use codes for messages. Conversation with TPG has shown that some form of parsing will have to happen in the future, but there is no necessity currently. There is also not enough support from the development teams to pick it up now, so we are going to move forward with sending messages transparantly through OMOTES and keep the option open to use codes at a later point in time. The motivation to immediately move to codes was due to the expected investment needed by the optimizer and simulator teams but the current investigation has shown that the immediate investment is very little (~10ish messages expected). Therefore, the proposed plan for now:

@MichielTukker @MarkTNO @edwinmatthijssen @KobusVanRooyen : Does this altered proposal have your support? If so, I will also check with the others before proceeding with a detailed design. Otherwise, please share your thoughts and proposed alterations.

lfse-slafleur commented 3 weeks ago

@MichielTukker @MarkTNO @edwinmatthijssen @KobusVanRooyen I haven't had a reply yet on my Q if this altered proposal fits for you all. If so, we can start work on it at the cloud side.

MarkTNO commented 3 weeks ago

This sounds good!

edwinmatthijssen commented 3 weeks ago

To me too....!

KobusVanRooyen commented 3 weeks ago

@lfse-slafleur in principle it all sounds good, except the last bullet point about documentation. But I think we can discuss this item further and it will should not prevent code development work.

lfse-slafleur commented 3 weeks ago

@KobusVanRooyen Discussed through Teams. We will hold off on generating more documentation for each issue. Instead, we will see if the messages are info enough for users.