eclipse-arrowhead / roadmap

Eclipse Public License 2.0
5 stars 9 forks source link

Request/Response models in Arrowhead #15

Open mzsilak opened 4 years ago

mzsilak commented 4 years ago

Hello,

As you might be aware, there is an error response model in arrowhead which neatly converts to an exception if the provided http clients are used.

{
  "errorMessage": "string",
  "errorCode": int,
  "exceptionType": "string"
}

This model is not and probably cannot be modeled in swagger.

Arrowhead already uses a RequestObject/ResponseObjectmodel and I would propose to extend this. All Request/Response classes shall extend AbstractRequest/AbstractResponse classes.

The AbstractResponse classes should include the HTTP response code (or an arrowhead internal response code) as well as a response message in all cases. This kind of response should be returned in all cases, even in those cases where the return is currently "void" (i.e. delete methods).

private final Integer responseCode;
private final String responseMessage;

Furthermore, all abstract classes should contain the following fields:

private final String correlationId;
private final String transactionId;

The correlationIdwould be used to map request and response to each other. This makes it easier to identify the messages in case of high traffic. The transactionId is similar, however it exists not only for a single request/response but for the duration of a transaction. For example a client request to the Orchestration System might trigger a subsequent request by the Orchestration System to the Authorization System. This subsequent request would have the same transactionId and a new correlationId. The use of such a transactionId also enhances the traceability of all calls throughout the system.

More child classes could be created for specialized cases: AbstractPayloadResponse for a response with a payload. AbstractPaginationRequest/AbstractPaginationResponse which include the common parameters which are used during pagination and where the payload in the response is actually a collection of items.

Comments?

kind regards, Mario

mzsilak commented 4 years ago

Please also see the following link regarding the the traceability: https://www.baeldung.com/mdc-in-log4j-2-logback

emanuelpalm commented 4 years ago

@mzsilak I'm glad you are picking this one up. I've thought about the message semantics problem space a bit, even though your points are mostly new to me, and think it should be discussed. :+1:

Error Response Objects

I guess these cannot be modeled in Swagger because Swagger only deals with request and response objects? I'm not overly familiar with Swagger.

My understanding is that in Arrowhead, services are modeled as if being sets of functions, each accepting zero or one argument and returning zero or one result. When implementing a service, the way of expressing those arguments and return values has to be adjusted to whatever application-level transport protocol is being used, such as HTTP or JSON-RPC. My understanding is that Arrowhead currently mandates that errors be handled in whatever way is most suitable using the application-level transport protocol in question, which often means that you can only use a small set of predefined error codes. I think the solution is something on the lines of what AITIA have implemented, which is to use standardized objects for conveying error information. I guess those error objects could be considered "exceptions" thrown, or perhaps raised, by the "functions" of a service. The exceptions that can be thrown should be documented in the SD and IDD documents of each Arrowhead service, including string and/or number identifiers for each possible exception. In some cases there might be ways for the system receiving an "exception" to be able to act on it, so I think it would be appropriate if individual errors could include additional information.

A realization of this abstract "exception" object for HTTP/JSON could perhaps use shorter names than are used now, as well as allowing any additional fields as specified in some service IDD, such as in in:

{
  "text": "string",
  "code": "string or integer"
}

"text" should be a human-readable error description, while the "code" should correspond to a well-defined identifier associated with the Arrowhead SD/IDD of the service in question. Just including the HTTP status code in the message, as is done today, repeats the information already available in the HTTP response. The "code" field could be seen as the combination of the "errorCode" and "exceptionType" fields in the old error object.

Abstract Requests and Responses

What would be the semantics of the responseCode and responseMessage you propose? Do you imagine that the information is collected from the HTTP/CoAP/MQTT/AMPQ/JSON-RPC/etc protocol header? Would there be any specific Arrowhead semantics?

I'm not sure what you are proposing with the correlationId. If using e.g. HTTP, there is no need to explicitly associate requests with responses as it is a strict requirement for the transmission medium to be reliable (i.e. TCP must be used) and for responses to arrive in the same order their corresponding requests were sent. See RFC7230, Section 2.1 and Section 6.3.2. Most transport protocols have some built-in mechanism for associating requests with responses which whatever Arrowhead system using them can piggyback on. CoAP, for example, requires the use of message IDs (see RFC 7252, Section 2.1) as it is intended to be used with UDP, which does not guarantee a reliable transport medium. Do you imagine that raw sockets could be used, or something else?

I'm very much convinced that the use of transactionIds is very appropriate for messages that indeed are related and should succeed of fail together (like transactions in SQL). I'm very unsure if requiring the field for every Arrowhead message, nomatter if transactional behavior is applicable, is worth the complexity and space overhead it would introduce. Do you have any particular application in mind where you want transaction IDs?

Pagination Semantics

Yes! We need this! Thank you for taking this up! :1st_place_medal: :-) An alternative to this in HTTP is range queries, which can be used both for bytes and for arbitrary objects (see MDN - HTTP range requests for information about how they can be used for bytes). If URI query parameters or range queries are used doesn't really matter. The point is that it is often very useful, especially for Arrowhead service "functions" that return lists of things, to reply with a subset of the elements in whatever list was requested, either because a subset was requested or to limit the memory overhead introduced by encoding and sending large lists of things. Range queries have the advantage that they can be used even if the requesting system did not specify that it wanted a range, which cannot be done with query parameters.

I'm not quite sure if it would be appropriate to demand that pagination/ranges be handled in one specific way for all Arrowhead systems, as Arrowhead strives to be protocol-agnostic. I do think, however, that there should be a formalized Arrowhead semantics for range queries for different protocol/encoding combinations that can be explicitly adhered to, if deemed appropriate, by Arrowhead services.

emanuelpalm commented 4 years ago

I think the same applies to "exception" semantics as to range queries. There should likely be a formalized way of conveying error information, but not a strict requirement for every service to implement it. Services that do comply should indicate so in their documentation (IDDs and SD).

Again, the reason for this is that Arrowhead strives to be protocol-agnostic. Using Arrowhead should not prevent you from using any features a particular protocol offers.

emanuelpalm commented 4 years ago

By the way, there is a standardized form of Arrowhead document I believe to be designed for these kinds of semantics standards. The SP (Semantics Profile). I'm not sure how keen people in the project would be about making Arrowhead-specific SPs, however, as SPs have, up until this point, only (?) been references to existing standardized semantics, such as SenML.

mzsilak commented 4 years ago

Hello @emanuelpalm ,

What would be the semantics of the responseCode and responseMessage you propose? Do you imagine that the information is collected from the HTTP/CoAP/MQTT/AMPQ/JSON-RPC/etc protocol header? Would there be any specific Arrowhead semantics?

The responseMessage would be a human readable error message, much like the currently used errorMessage. The responseCode could be unique Arrowhead specific code. I can imagine using a 9-digit number and coding a lot of information into it. e.g. 3 digits depict the core system, next 3 digits the core system service and last 3 digits a unique error within that core system service. Some common exceptions can be mapped to the same 3 numbers of course. A database exception on system A and service X would have the number AAAXXX001 and the same database exception on system B and service Y would have the number BBBYYY001.

I'm not sure what you are proposing with the correlationId.

I imagine that in future some asynchronous protocols might be supported as well. Apart from that, a system might receive hundreds of messages per second. Of course this is not an issue from computation point of view, but reading a log file is so much easier if you can identify a request and its response by their transactionId or correlationId. The difference between transactionId and correlationId is that the transactionId is valid for any subsequent call while the correlationId is only used once.

mzsilak commented 4 years ago

Regarding responseCode: We could also add a digit to show the "kind" of code. Some errors are temporal (i.e. network issue or unreachable service), some errors need change in the client (i.e. validation errors or missing input) and some errors need change in the server (i.e. programming errors). I am sure we can find more kinds of errors in arrowhead. However the responseCode is not limited to errors. We could also create a number of "success" codes similarly how the HTTP protocol has a number of 2xx codes.

emanuelpalm commented 4 years ago

So, let's see if I understand your points correctly:

Response Codes You are proposing that all messages be associated with an Arrowhead-specific integral identifier that maps to a pre-defined list of statuses (both errors and other conditions), similar to how HTTP or CoAP have such lists of pre-defined codes. Apart from the pre-defined ones, there would also be room for each Arrowhead service to define its own list of custom status codes. The codes would be used nomatter what concrete protocol or encoding is used for a given communication.

I have two comments to this idea, which I think is good!

  1. I do not think it is appropriate to force a single semantics on every single Arrowhead service. Even if such a list of pre-defined codes indeed would be very useful and should be adopted by most, if not all, Arrowhead-compliant systems, it should not be a requirement. The reason for this is that one of the core goals of Arrowhead is protocol agnosticism. If we are forcing what effectively amounts to a meta-protocol be used on top of other protocols, we are no longer protocol-agnostic.
  2. I think every conceivable measure should be employed to avoid duplication of data in messages. It should be a paramount objective of the Arrowhead project to make sure that messages are as small as they can realistically expected to be, without interfering too much with their eligibility. For example, if the HTTP status code in a given message maps perfectly to an Arrowhead status code, the Arrowhead one should be omitted from the message. Avoiding duplication also has the added benefit of avoiding the confusion that could arise from having two status codes in a message, potentially with slightly conflicting semantics.

Correlation Identifiers You are proposing that a cloud-unique identifier be assigned to each message sent. That identifier can be used by a third party observing a communication to correctly determine the message flow between systems, which could be useful for debugging and monitoring purposes. As a complement to this identifier, you also want a transaction identifier, which allows systems to group related messages together, making it easier to inspect the messages.

I don't like these ideas for two reasons.

  1. Correlation identifiers provide no additional information to a message, given that either (A) a "synchronous" protocol (such as HTTP) be used or (B) an "asynchronous" protocol (such as CoAP) that already has message identifiers as part of the protocol. In HTTP/1, there already are identifiers in the TCP frames that allows communications to be debugged and monitored. In HTTP/2 and HTTP/3, there are stream identifiers that serve the same purpose. My understanding that even media streaming protocols, such as those used for telephony or video streaming, typically also have message identifiers. Message identifiers are so universally useful that protocols not having them are practically non-existent. If you are after some kind of system debugger, I would suggest that the debugger records all communications and assigns its own identifiers to all messages.
  2. What is a "transaction"? In SQL, there is no question about it. A transaction is a set of changes that are committed or rolled back atomically. But how can such a well-defined semantics be found for all other kinds of communications? I fear that developers are going to choose their transaction identifiers according to loosely defined rules they make up themselves, which in practice is going to lead to the transaction identifiers being practically useless for most use cases. They just add to the size and confusion about the implications of messages.
mzsilak commented 4 years ago

Hi @emanuelpalm

Response Codes

I agree with point 1. We could at least implement them in the core systems and let other systems do what they want. As for number point 2, its a trade off. For example status code 404 maps to "NOT FOUND", however it doesn't tell us if a requested entity in a GET request is not found, or if the URL/service is not found. Additionally (and this plays into the correlation identifiers), the HTTP code does not tell us if the request failed at the invoked core system or on any nested http call (i.e. orchestration service to authorization service).

Correlation Identifiers

I would see the general (non-technical) meaning of "transaction". i.e. "a communicative action or activity involving two parties or things that reciprocally affect or influence each other" (merriam-webster.com). The main purpose of both would be traceability and logging. I don't know of any way to get low level protocol identifiers into our log4j2 logs.

Having the identifiers in the messages would be already useful for high traffic environments, but such a feature really shows its power if the client can reference such an identifier. Its just easier to open and troubleshoot a defect which says: "transactionId XYZ, correlationId ABC on 2020-04-20 failed although it should have worked" instead of going by timestamp (which might be slighty different on client and server) and hope that there is some uniqueness to the request.

My idea is that the transactionId and correlationId is optional on the very first (new) request. The core system would be responsible to create a transactionId and correlationId if the request doesn't contain them already. If the request already contains the ids, they must be reused. Any request that the core system does in order to fulfill the initial request, must contain the same transactionId and each core systems must return the correlationId and transactionId in the response .

Going back to my first example with the OrchestrationService. Each time the orchestrate service gets invoked, it does a request to the AuthorizationService as well. Depending on the use case, other systems/services might be involved as well. All those requests would have the same transactionId to link them back to the original request on OrchestrationService.

emanuelpalm commented 4 years ago

Response Codes

As for number point 2, its a trade off.

Yes, indeed. The HTTP status codes are mostly inadequate. There are more problems, such as the differences between the 200 codes (you can always infer which one of them you should get from context, so there is no point in there being more than one of them). What I'm saying is that we should be open to e.g. not having a "responseCode" in successful responses, as it would convey no additional information, while always including them for e.g. client errors.

Information about whether the current or a nested call failed is useful for debugging and should perhaps end up in a log, but I don't think that such information need to be relayed to the sender of the "first" request. I think a suitable rule of thumb for error messages is this: If the message can be accepted if the sender changes its message, be specific about what is wrong; if the message cannot be accepted for any other reason, be vague about what is wrong to the sender, but log the specific reason to make it available to system operators and developers. If the message might succeed if tried later, tell the sender to try again. Information about the internals of a system can potentially be used for security exploitation or other kinds of malignant use.

Correlation Identifiers

You have convinced me that what you are proposing is useful, so that is not something you need to spend more energy on. What I'm reluctant about is making an engineering decision today because it is easy, and then having to live with its negative implications for a long time. In my eyes, adding identifiers into messages because they are hard to get with your current libraries and tools does not sound like a justifiable long-term decision. In the long run, the libraries people use will gradually become Arrowhead-libraries, rather than the RESTful libraries people use today. With that transition, relevant data will be ensured to be accessible. Also, I believe that anything that makes a system more complicated should be resisted until it can be certain that the added complexity is justified.

Going back to my first example with the OrchestrationService. Each time the orchestrate service gets invoked, it does a request to the AuthorizationService as well. Depending on the use case, other systems/services might be involved as well. All those requests would have the same transactionId to link them back to the original request on OrchestrationService.

So, are you saying that transaction IDs are used to exclusively identify related nested calls (i.e. messages that were are sent to fulfill the requests in other messages)? That is a strict enough semantics to address my What is a "transaction"? concern. :+1: I like it. Then I have a proposal for an adjustment. How about using trail identifiers?

Trail Identifiers

A trail identifier is a list of dot-separated numeric identifiers. Let's say we have the below scenario with four systems. The first system, [ A ], which is some arbitrary application system, either decides on a random root identifier or skips it entirely. In this case, it is skipped.

Application           Core               Core               Core
   [ A ]              [ B ]              [ C ]              [ D ]
     |        ?         |                  |                  |
     |----------------->|                  |                  |
     |                  |                  |                  |
     |     <No TID in request, `1` set>    |                  |
     |                  |                  |                  |
     |                  |      `1.1`       |                  |
     |                  |----------------->|       `1.2`      |
     |                  |------------------+----------------->|
     |                  |                  |       `1.2`      |
     |                  |<-----------------+------------------|
     |                  |      `1.1`       |                  |
     |      `1`         |<-----------------|                  |
     |<-----------------|                  |                  |
     |                  |                  |                  |

[ B ], which receives the message, assigns it a random trail identifier, in this case 1, and sends two other messages to [ C ] and [ D ]. As those messages were triggered by message 1, both of them include that 1 in their trails. Two random numbers are chosen for the messages, in this case 1 and 2, resulting in the new trail identifiers 1.1 and 1.2. When answering to the messages, the trail identifiers are included. Let us assume that the message from [ B ] to [ D ] results in an error. [ D ], [ B ] and [ A ] then all log different errors, each including its trail identifier of the messages that failed into their logs. An operator interested in message 1 can then search for it in the log, and will then discover the two other log entries related to the message, which may look something like this:

2020-04-20 12:53:12   [ D ]   TID:1.2    Not enough memory to handle request. discarded.
2020-04-20 12:53:14   [ B ]   TID:1.2    Server error, try again later.
2020-04-20 12:53:21   [ A ]   TID:1      Server error, try again later.

As the TID can represent request levels, there is no need to separate it into a correlation and transaction identifier. One identifier is enough. If the identifier is added as a protocol header rather than into the payload of requests, we get the added benefit of application systems not having to be aware of the identifiers at all.

@mzsilak What do you think?

mzsilak commented 4 years ago

Hi @emanuelpalm

Response Codes What I'm saying is that we should be open to e.g. not having a "responseCode" in successful responses, as it would convey no additional information, while always including them for e.g. client errors.

I agree with that. My initial problem was swagger, but after consulting the documentation a little bit more, it seems that it is possible to let swagger show the different entities that are returned depending on the HTTP Status.

I also agree with your points regarding which information to reveal to a client. I always dealt in a closed environment where this was not a concern, so this wasn't in my mind.

Correlation Identifiers / Trail Identifiers

Trail identifiers sound good. Those can save even more information i.e. System [C] calls another System [E], resulting in a TID:1.1.1. I like it very much.

As the TID can represent request levels, there is no need to separate it into a correlation and transaction identifier. One identifier is enough. If the identifier is added as a protocol header rather than into the payload of requests, we get the added benefit of application systems not having to be aware of the identifiers at all.

That would work for HTTP and MQTT. I don't know about CoAP or other protocols that might be used in future.

emanuelpalm commented 4 years ago

@mzsilak If we go by the "Semantics Profile" (SP) approach I described earlier you would write one description of how to use this semantics for each protocol you want to support. We would likely start writing a document only for HTTP. If a header-only approach is not enough for a given protocol, then we would have to write SPs for protocol/encoding pairs.

By the way, CoAP uses an alternative to HTTP headers they call options. An option is an integer key followed by an arbitrary value that can be no longer than 65266 bytes long. We would just have to e-mail IANA and tell them we want to register an option ID, and they'll register one for us. We would have to provide an IANA contact person and so on. Given that Arrowhead becomes an Eclipse project, I don't think registering an ID will prove to be a problem at all.

emanuelpalm commented 3 years ago

@mzsilak Perhaps it would be appropriate to turn this discussion into a new proposal, like how I just turned #14 to #22? If so, then perhaps this is something that could land in the Arrowhead 5.0 roadmap? As you initiated the discussion, could you do it?

jerkerdelsing commented 3 years ago

The discussion points to the very general interoperability question e.g.

The approach takens so far is

Since properties captured by one model very seldom will have a perfect match with another model there will be loss of properties. What we can do is, in one way or another, "flag" these losses while providing translations or adaptors.

Then for SP's I don't think we should invent yet another data model ontology or semantics. We should use whats there and provide information on whats implemented.

emanuelpalm commented 3 years ago

@jerkerdelsing Using somethings that's already there requires that thing to actually be there and be possible to find. Furthermore, if there is a standard of convention, but it is not followed by anyone or has bad quality, that that is virtually the same as there being no standard.

While I can find an open source solution for message tracing (https://opentelemetry.io/), I cannot find any such standards. I do admit, however, that we should take a good look at existing message tracing solutions before we decide on producing our own.

Regarding error messages, an alternative could be for Eclipse Arrowhead IDDs to explicitly specify the format of each possible error message (which I believe they already should?), and then we just make sure that the core services all share a smaller set of error messages (or even the same one error message structure). Then we both get the benefit of code reuse while not having to write any SP or claim to have produced our own standards.

jerkerdelsing commented 3 years ago

I think that the architecture approach should be open to support any widely used approaches/standards regarding e.g. protocols, metadata, error codes, ........

emanuelpalm commented 2 years ago

@mzsilak @jerkerdelsing How do you propose that we proceed with this?

jerkerdelsing commented 2 years ago

Let's bring this to the Roadmap WG to assign a small task force to come up with a proposal. I will add it to the agenda for the next meeting.

emanuelpalm commented 2 years ago

@jerkerdelsing Proposed in an e-mail to me that we look into using something like Pinpoint (https://github.com/pinpoint-apm/pinpoint) for message tracing. I guess this issue should be broken up into one for each topic it covers. I will try to do it in the coming weeks.

tsvetlin commented 2 years ago

Trail ID will be proposed in #44 We will look into Pinpoint, how it can be useful or not in our case in particular.