lfse-slafleur commented 2 weeks ago

Staggered approach. First we are going to focus on transporting messages transparantely from the worker to the frontend without any type of parsing by OMOTES. There is currently not enough support nor value in providing informational messages through codes. However, we may introduce codes at a later stage if it becomes a requirement.

New plan detailed below. Left this starting comment as is for when we will continue the work using codes.

Issue description

The worker expresses multiple types of information that is useful to show to a user of the frontend regarding their submitted job. The following types of information have been gathered:

If a job succeeds:
- Information regarding assets. (also called 'Asset Feedback')
- Information regarding the ESDL as a whole.
If a job fails:
- Error messages why the job couldn't be processed.

Currently errors are caught and the logs of a job are forwarded to the frontend. However, this is not a user-friendly way of showing any type of errors. Mostly because it doesn't only show the error but also all kinds of debug information. While this debug information is useful for developers, a user is unable to currently interpret it. Therefore, we should provide a way for user-friendly error messages to be propagated from the worker to the frontend.

Also, in the near future, we expect that the workers are able to supply additional information to a user regarding the output ESDL. This is also referred to as 'Asset Feedback'. Specifically MapEditor already supports a method for showing informational messages tied to a specific asset and MapEditor requires a specific format to be used for these informational messages.

Origin of messages

While the worker will perform the job, the orchestrator may also introduce informational messages. Specifically if a job timeout or delivery limit was reached the orchestrator will provide the necessary error message.

Solution direction

Displaying the informational message(s) at the frontend

Each frontend will have their own interface on how to display the messages. Specifically localization (language in which the message is shown) may be different depending on the frontend and the user. A frontend may want to:

Use their own format and interface for informational messages instead of through the OMOTES SDK or REST interfaces.
Not care about localization and only want informational messages in a single language.
Can show informational messages regarding assets but not global messages or vice versa.
Filter out messages with a specific severity e.g. DEBUG.
Depending on the error a frontend may want to show a different message than what it actually means. For instance, lets assume a job failed because a job delivery limit is reached. The informational message meaning would be Job A could not be processed because the worker was unable to finish the job repeatedly. A frontend may rather choose to show the message Please contact our support desk with the following reference: E012.

In order to fit all the requirement informational messages will not use a message in a single language, but rather a message code. Currently we expect 3 different message codes for errors from the optimizer, around 3 error codes from the orchestrator and an unknown amount from the simulator. This list will grow in the future. This code will be accompanied with a number of key-value pairs that make up the dynamic parts of the error message. Example: 'The heatdemand in could not be met by wh' where asset_id and number_of_wh are dynamic values.

Identifiers

The identifiers to use for messages codes will be of string type. This allows a message code to be descriptive and human-readable without the corresponding message. This leads to more readable source code, easier to understand errors during system administration as well as reduce the amount of bugs if messages codes would be only an integer (easy to make a simple typing mistake).

Severity

The proposal is to use the same levels for severity as the logging module follow:

DEBUG <-- Useful only for developers. Should not be shown to users per default. Specifically optimizer team expects this level to be useful.
INFO <-- Shows extra information regarding the output ESDL without questioning the validity of the result.
WARNING <-- Something regarding the output ESDL is questioning its validity. The user should inspect if the warning is a cause of concern.
ERROR <-- Job (optimization or simulation) could not succeed because there was something wrong in the input data or the job couldn't be processed due to an IT infrastructure issue (e.g. job timeout, job reached delivery limit). It is also possible that an ESDL may be invalid but an output esdl is still given. For this situation, it may occur that a message exists with severity error and an output ESDL is still generated.

Dynamic values in informational message

A message may reference a message code to denote the meaning, but to convert the message into a language e.g. English or Dutch the message may contain a number of dynamic values. For instance a message e.g. The peak of heatdemand for asset {{asset_id}} exceeds the total heatproduction by {{exceeded_by_kwh}} kwh of thermal energy. contains 2 dynamic values: asset_id and exceeded_by_kwh. Therefore, messages codes may be accompanied by a number of key-value pairs to further describe the message.

Datastructure

An informational message may pertain to a ESDL or a specific asset. It has a severity, a message code and a number of key-value pairs for dynamic values. The following datastructure (within OMOTES) is proposed:

severity: Enum[DEBUG,INFO,WARNING,ERROR]
message_code: str <-- Example simulator_unable_to_parse_esdl
dynamic_values: dict[str, union[str, int, float, bool]] <-- specifically we allow int and float as well because a frontend may want to define their own precision when converting a number to text. This dict may be empty.
optional asset_id: str <-- If this message pertains to a specific asset, the asset_id field contains the asset id. If the informational message is regarding the job or ESDL as a whole, this field is empty.
technical_message: str <-- The original message from mesido, simulator-core or orchestrator. This is useful for developers and/or support agents. Also for users in case they need to communicate directly with the developers and need to relay the errors that they received. The messages shown to users by the frontend are not necessarily the messages generated by mesido, simulator-core or orchestrator. This message, together with the message_code should provide the necessary context for support.

Documentation & default template

We should extend the wiki with a 'message code' page that describes what each message code means, the expected dynamic value keys, the expected severity and what type of ESDL asset the message code may reference. We should also maintain a file to with default messages for each message code in English.

Support

The perspective of 'providing support' is essential for this feature. The informational messages may be interpreted solely by the user and/or also by a support agent and/or developer. As such, the informational message is targeting multiple readers at once which each have different use cases for the message:

Non-expert frontend user: Understand what the message means for them and what they should proceed with the message. --> Use the message shown by the frontend to interpret the issue and act accordingly.
Expert frontend user: Understand the underlying issue as well as understanding what the issue means for them and how they should proceed. They will want to understand the ins-and-outs of the message which may be beyond the info what the message provides. --> Use the message code to look up indepth info. There needs to be more public information what a message code means.
Frontend support agent: Find a number of resolution steps to propose to a user seeking support to resolve their issue and to also explain what the issue means in their case. --> Use the message code to look up support steps based on frontend-internal documentation.
OMOTES developer: Know which specific message occured to locate it in the code and which inputs caused this message. They may also provide the role of 'support agent'. --> Use the message code to find the relevant code.

Users interact with a frontend and how this frontend works is not defined from the scope of OMOTES. Therefore, how to proceed when a specific message is provided should be defined by the frontend. This ties into the requirement for a frontend to be able to define the text which is shown to the user based on a message code and dynamic values.

Architectural division of responsibilities

simulator-core / mesido: Either provide standard messages to the cloud team which may be captured in the workers OR provide error/message codes to the cloud team and share what they mean. Message capture may be performed by cloud team in the worker based on regexes. We would hook into the message-providing-mechanism that simulator-core or mesido provides instead of forcing a generic mechanism on the tools. These tools are also used outside of OMOTES and we should leave the design of these mechanisms at simulator-core/mesido.
Worker: Convert a simulator-core/mesido specific message to OMOTES format (using either the message code or the message capture). Add these informational messages to a JobResult (either due to a job failure based on exception or on a job success based on messages returned by mesido and/or simulator-core)
Orchestrator: Pass along the JobResult containing informational messages or create informational messages when a JobResult is created by the orchestrator (such as due to timeout or job delivery).
SDK: Pass along the informational messages in OMOTES format.
OMOTES <-> Frontend conversion layer: Proposal is to create a new, optional layer which can convert OMOTES-specific datastructures to Frontend-specific datastructures. This adapter layer uses the SDK and is maintained by the respective frontend team in collaboration with the cloud team. This layer allows the message code to the converted to a message in any format or language. MapEditor will support this adapter layer by extending their generic interface to connect with external models (if necessary) which allows us to keep omotes-rest generic and still the way to connect MapEditor with OMOTES.
Frontend: Display the informational messages to the user in the desired format.

Proposed extra actions:

~~Introduce a new architectural layer: OMOTES <-> Frontend conversion layer. This is implicitely defined when the SDK is used directly but this layer is missing when the omotes-rest interface is used.~~ Not (yet) needed for MapEditor.
Currently omotes-rest is an opensource REST api that is designed to work around mapeditor AND provide a generic REST interface for other frontends. This dual-purpose creates tension in the design of the component. Now that we have proposed the new OMOTES <-> Frontend architectural layer, we would propose to rename omotes-rest to omotes-rest-mapeditor and redefines its purpose to only work with mapeditor. We could fork a version of the current omotes-rest to a new repository which keeps the name omotes-rest in case we want to provide a generic REST interface later. Specifically, this split allows us to convert the informational messages in OMOTES format to Mapeditor format. Not needed due to change above regarding MapEditor.

Example walkthrough at each component in case of an exception at worker

Lets assume a job is submitted for an optimization job and the optimizer-worker is currently working on it. However, the temperature in the pipes fall below the operation threshold of 30C (I am making this up here, no clue if this is a realistic example) and the optimizer throws a MesidoException.
The MesidoException is caught by the optimizer-worker. This MesidoException contains the Mesido-specific message code 'error_supply_temp_too_low' with dynamic values { "asset_id": "pipe1", "operating_threshold": 30.0, "supply_low_point": 25C, "when": "2024-04-01T01:00:00" }. The optimizer-worker recognizes this exception and converts it to a JobResult with the informational message { "severity": "ERROR", "message_code": "optimizer_error_supply_temp_too_low", "dynamic_values": <same as MesidoException>, "technical_message": "<The str(MesidoException)>", "asset_id": "pipe1" }
This JobResult is forwarded to the orchestrator with ResultType.ERROR.
The orchestrator processes that this job has finished and forwards the JobResult to the frontend SDK.
The SDK passes the JobResult to the OMOTES <-> frontend conversion layer (in case of MapEditor, omotes-rest-mapeditor and in case of NWN-DTK frontend, this is just part of the webapp).
Assuming MapEditor is the frontend, the informational messages are converted to MapEditor specific asset feedback messages. Any DEBUG messages are removed and any messages without an asset_id ?are handled in a different way? (see question below)
The user of MapEditor sees the job has finished and is displayed the informational messages and the job result.

Arguments to not use codes but rather forward the messages as is

Currently we propose to capture messages from the optimizer/worker and convert them into standardized codes due to the arguments in the previous sections. However, @MarkTNO has made some valid points and voiced some concerns on why we shouldn't use codes but rather forward the messages as-is:

The optimizer/simulator or any other application in the worker may not introduce any codes by themselves. Specifically the optimizer team has already voted against this as it makes less sense for them. They would have to introduce it for OMOTES and they would not see codes as added-benefits when running mesido outside of OMOTES. Therefore, we would couple any worker directly with the error management system of the application underneath. With mesido we will need to capture specific exceptions and parse the error messages for any dynamic fields as well as match the error message with a specific code. This has a couple of consequences:
- Necessity to inform the cloud team on any new messages from mesido on any updates. This will go wrong at some point and a (new) message may not have a code to match.
- Any new messages will lead to code changes within the worker which leads to work at the cloud team.
It is also an option to transparantely forward the messages to the frontend and let the frontend handle the parsing of error messages to convert it into a code or translate the message.

There are trade-offs here that we will want to navigate in the architecture team.

Work due to proposal

Cloud team

Add the informational structure as part of JobResult in omotes-sdk-protocol.
Alter workers to capture informational messages from mesido & simulator-core.

Optimizer team

Decide how to handle informational messages within mesido and if they want to use message codes internally as well.]
- Response: They will support a 'MesidoException' class (or similarly named) but this exception will not implement standardized codes (so only messages).

Simulator team

Currently none, we need to propose this design to them still and check if they will want to generate any informational messages.

TPG team

Currently none, except for questions below which will lead to work.

MapEditor team

Currently none, except for questions below which may lead to work.

Remaining questions

Can and/or will MapEditor display non-asset-feedback messages (no asset_id field) in a user-friendly way?
How will the NWN-DTK frontend handle this?

edwinmatthijssen commented 2 weeks ago

I definitely want to extend the possibilities on this topic (handling model feedback, logging, progress reporting, and so on) in the ESDL MapEditor as well, so for asset-feedback there is no work (or hardly any work), but in general this statement is not valid.

MarkTNO commented 1 week ago

Too add to Arguments to not use codes but rather forward the messages as is: I would indeed prefer not to parse messages and create message codes in omotes. I think this should be done at mesido/simulator code. They have an overview of the different types of messages and keep track of a list. It would be impractical for omotes to do this and require omotes updates whenever a new message type is created.

I know that the mesido team has decided not to use message codes, which I prefer. The mapeditor will directly display the technical message. TPG might want do a conversion by displaying a Dutch text asking to contact customer services along with the technical message for reference, or preform a regex operation (such as is currently suggested in a general form in the omotes workers) specific for their own needs.

This will mean that:

when adding a new type of message no work will be required, if TPG decided to use the technical message for reference.
if TPG will perform message type specific transformations, the mesido/simulator team will have to specify new message types with the new release and the frontend doing message transformation will have to handle this new type (as in the current proposal), but no work is needed in omotes.
if a mesido team member is asked for feedback on an issue they will receive the technical message they issued, instead of a message code that they have to look up, since the mesido team decided not to work with message codes (I believe the simulator team has not yet decided on this).

Regarding the Remaining questions: I think is useful to have general feedback messages (also as an option in the ESDL validator). We have to discuss the best way for displaying these and implement.

edwinmatthijssen commented 1 week ago

True.... adding non-asset specific feedback to the MapEditor is easy and good to add!

cwang39403 commented 1 week ago

I would prefer to forward the messages as they are from a maintenance perspective.

This proposal could be a nice addition to the frontend user to navigate when errors occur, but assuming most users using OMOTES are likely to have a certain technical background and probably also going through relevant training already, the additional layer of adding error codes and parsing technical message does not add that much value at this moment in my opinion.
I am more concerned about the maintenance of this additional layer as this requires inputs from different teams. Now it is still easy to find each other during the active development phase, but it is hard to guarantee in the longer term. Also, any change request could introduce changes in different dependent components.

For the time being, if we constantly receiving similar support requests from the users, I would look into the direction of having a Q&A section on the (architecture) documentation page that lists a few commonly seen technical/error messages to help users with the next steps.

lfse-slafleur commented 2 days ago

Hea Cheng-Kai!

I would prefer to forward the messages as they are from a maintenance perspective.

* This proposal _could_ be a nice addition to the frontend user to navigate when errors occur, but assuming most users using OMOTES are likely to have a certain technical background and probably also going through relevant training already, the additional layer of adding error codes and parsing technical message does not add that much value at this moment in my opinion.

From MapEditor perspective it doesn't add much but for frontends e.g. NWN-DTK by TPG it is a must to have some form of standardization so they can attach both user-friendly messages and also provide the option to have technical messages available.

* I am more concerned about the maintenance of this additional layer as this requires inputs from different teams. Now it is still easy to find each other during the active development phase, but it is hard to guarantee in the longer term. Also, any change request could introduce changes in different dependent components.

Definitely a concern and it seems this is also the core of Mark's worries on how to proceed here.

For the time being, if we constantly receiving similar support requests from the users, I would look into the direction of having a Q&A section on the (architecture) documentation page that lists a few commonly seen technical/error messages to help users with the next steps.

Definitely good to add! Currently they are not receiving any technical/error messages so lets provide both some form of message and also a Q&A section.

Going to reply with another message shortly to alter the proposal to one that seems to have the most support currently.

lfse-slafleur commented 2 days ago

Currently it is a step too far to use codes for messages. Conversation with TPG has shown that some form of parsing will have to happen in the future, but there is no necessity currently. There is also not enough support from the development teams to pick it up now, so we are going to move forward with sending messages transparantly through OMOTES and keep the option open to use codes at a later point in time. The motivation to immediately move to codes was due to the expected investment needed by the optimizer and simulator teams but the current investigation has shown that the immediate investment is very little (~10ish messages expected). Therefore, the proposed plan for now:

Datastructure will remain the same except the fields dynamic_values and message_code will not be used. We will continue to propose the message to be called technical_message.
asset_id will remain optional so allow for asset-specific messages and global messages.
Message is captured by the workers in whatever form the library underneath provides. In the case of mesido currently, we will capture specific exceptions and relay the message in those exceptions as error messages when an output ESDL cannot be calculated. At a later stage, mesido will also include warnings, info and debug messages if the ESDL is calculated successfully but there is more info that should be shared with a user.
Frontends can optionally show the technical_messages provided by OMOTES. The requirement to be able to localize messages is not considered for now.
We will provide a list of expected messages with their meaning in the form of a Q&A for both frontends and end-users. This is something frontends can optionally link to to provide end-users with more details regarding the error, warning or info message. Debug messages are considered to be seen only by developers of backend models e.g. mesido and simulator so these messages will be not be expected on the Q&A list. We will request the optimizer and simulator teams to maintain this list within their own documentation.

@MichielTukker @MarkTNO @edwinmatthijssen @KobusVanRooyen : Does this altered proposal have your support? If so, I will also check with the others before proceeding with a detailed design. Otherwise, please share your thoughts and proposed alterations.

Project-OMOTES / architecture-documentation

WIP: Propagating informational & error messages from workers to frontend #38

Issue description

Origin of messages

Solution direction

Displaying the informational message(s) at the frontend

Identifiers

Severity

Categories

Dynamic values in informational message

Datastructure

Documentation & default template

Support

Architectural division of responsibilities

Proposed extra actions:

Example walkthrough at each component in case of an exception at worker

Arguments to not use codes but rather forward the messages as is

Work due to proposal

Remaining questions