Particular / ServiceInsight

Advanced debugging for NServiceBus
http://particular.net/serviceinsight
Other
17 stars 32 forks source link

Proposed design: Sequence & Timeline views #220

Closed dannycohen closed 10 years ago

dannycohen commented 10 years ago
  1. Please comment on proposed design for Sequence & Timeline views in SI: https://www.dropbox.com/sh/xd3vncebftxeh4j/Pr68R-mHPw
  2. User scenarios of this view:
    • "As Archie, I would like to easily understand how the system's endpoint interact with each other"
    • "As Archie, I would like to easily view how the application behaves over time, assisting me in identifying significant delays in delivery or processing of messages"
  3. Goals of this view design:
    • Easy to implement (UI only; no need to change SC API)
    • Same visual language andfunctionality (e.g. message context menu) as Message Flow and Saga in SI, and Canvas in SM

      Please provide your feedback by EO Tue. Jan, 28th.

Related to https://github.com/Particular/ServiceInsight/issues/201

// FYI - @udidahan , @andreasohlund , @Particular/core-developers, @joaquinjares, @HEskandari, @esculli, @mauroservienti , @jdrat2000

dannycohen commented 10 years ago

@sergioc - FYI

udidahan commented 10 years ago

I have some issues with the proposed design.

I’d rather see a “box” representing the endpoint processing a message (UML style).

This would then conflict with the idea of a message being represented as a box.

Ergo, I’d have the name of the message appear on the lines between endpoints.

Also, if the processing of a message failed, the line shouldn’t be red. Instead the endpoint box would be red (though its size wouldn’t represent processing time anymore).

I’d also suggest using the SM visual language of commands being drawn as solid lines between endpoints and events being drawn as dashed lines.

dannycohen commented 10 years ago

@udidahan -

I’d rather see a “box” representing the endpoint processing a message (UML style).

Something like this ?

This would then conflict with the idea of a message being represented as a box.

If the answer to the above is yes (i.e. somewhat like this) then I see no inherent conflict.

The change would be to de-emphasize the message, from a rectangle to a caption-like display (i.e. "message name appear on lines between endpoints" etc.). We can, when hovering over the message, empahsize it again so that it will have the consistent functionality we provide in all other message interactions (i.e. message context menu etc.)

if the processing of a message failed, the line shouldn’t be red. Instead the endpoint box would be red...

The whole endpoint line ? from top to bottom ? or just the section in the endpoint corresponding to the failed processing attempt of the specific message ?

I’d also suggest using the SM visual language of commands being drawn as solid lines between endpoints and events being drawn as dashed lines.

Agreed. @sergioc - shall we reconsider https://github.com/Particular/ServiceInsight/issues/201#issuecomment-32243491 ?

udidahan commented 10 years ago

Yes. Like that.

Danny Cohen notifications@github.com wrote:

@udidahan -

I’d rather see a “box” representing the endpoint processing a message (UML style).

Something like this ?

This would then conflict with the idea of a message being represented as a box.

If the answer to the above is yes (i.e. somewhat like this) then I see no inherent conflict.

The change would be to de-emphasize the message, from a rectangle to a caption-like display (i.e. "message name appear on lines between endpoints" etc.). We can, when hovering over the message, empahsize it again so that it will have the consistent functionality we provide in all other message interactions (i.e. message context menu etc.)

if the processing of a message failed, the line shouldn’t be red. Instead the endpoint box would be red...

The whole endpoint line ? from top to bottom ? or just the section in the endpoint corresponding to the failed processing attempt of the specific message ?

I’d also suggest using the SM visual language of commands being drawn as solid lines between endpoints and events being drawn as dashed lines.

Agreed. @sergioc - shall we reconsider https://github.com/Particular/ServiceInsight/issues/201#issuecomment-32243491 ?


Reply to this email directly or view it on GitHub: https://github.com/Particular/ServiceInsight/issues/220#issuecomment-33248293

Sent from my Android device with K-9 Mail. Please excuse my brevity.

sergioc commented 10 years ago

Updated diagram: https://www.dropbox.com/sh/wjsyyoqukukcr4p/A4ecI5dCm2

andreasohlund commented 10 years ago

Big +1 from me, this is much better IMO!

On Mon, Jan 27, 2014 at 10:56 PM, Sergio notifications@github.com wrote:

Updated diagram: https://www.dropbox.com/sh/wjsyyoqukukcr4p/A4ecI5dCm2

Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceInsight/issues/220#issuecomment-33427697 .

udidahan commented 10 years ago

I agree that this is much better.

I’m not clear, though, as to why messages like ProvisionDownloadRequest would appear connected that way rather than the regular straight line.

Also, it’s important to understand that timeouts (like BuyersRemorseIsOver) originated from something else that happened in that same endpoint, so I’d like to see how that would be represented.

I don’t understand the story around the endpoint hovering (why are the machines shown?) nor do I understand the hovering on the host (Ecommerce@machine003) and what is meant to be displayed.

Can you clarify?

sergioc commented 10 years ago

I’m not clear, though, as to why messages like ProvisionDownloadRequest would appear connected that way rather than the regular straight line.

Those timestamps are all based on readings from SI in my machine - ProvisionDownloadRequest was actually sent before OrderAccepted finished processing...

Also, it’s important to understand that timeouts (like BuyersRemorseIsOver) originated from something else that happened in that same endpoint, so I’d like to see how that would be represented.

So BuyersRemorseIsOver originates in the Sales endpoint. I didn't manage to figure out what happened before that caused it. Could you elaborate on this scenario?

I don’t understand the story around the endpoint hovering (why are the machines shown?)

The idea of seeing a list of machines when hovering an endpoint is:

1) See all machines associated with that endpoint 2) Which machines are involved with the specific conversation the diagram applies to (white) 3) Which machines are causing messages to fail (red) 4) Which machines are associated with that endpoint but are not part of the conversation at hand (grey)

nor do I understand the hovering on the host (Ecommerce@machine003) and what is meant to be displayed.

When hovering on a specific host, the messages related to it (sent by, received and/or processed) are highlighted. In the example screen (no. 04), Ecommerce@machine003 is highlighted and shows which message is associated with that specific endpoint/host combination.

dannycohen commented 10 years ago

@sergioc - few clarifications:

So BuyersRemorseIsOver originates in the Sales endpoint... Could you elaborate on this scenario?

Here's how it looks in message flow. BuyersRemorseIsOver (or any other timeout message) will always be processed on the same endpoint instance as SubmitOrder (i.e. the message that defined the timeout message).

image

sergioc commented 10 years ago

Got it, I "delivered" SubmitOrder to CustomerRelations and "sent" OrderPlaced from CustomerRelations instead of to Sales and from Sales, respectively.

Correction: https://www.dropbox.com/sh/wjsyyoqukukcr4p/A4ecI5dCm2

dannycohen commented 10 years ago

@sergioc -

There's probably a need for a line connecting SubmitOrder and BuyersRemorseIsOver, to indicate the causal connection between the two (i.e. SubmitOrder created and requested / scheduled BuyersRemorseIsOver timeout message).

I would propose a line style that is somewhat different than the dotted Event (or full Command) lines.

Thoughts ?

image

sergioc commented 10 years ago

This?:

screenshot 2014-01-28 22 48 29

dannycohen commented 10 years ago

And here I was thinking about a mundane smaller-dots dotted line... It is a nice touch: clearly distinguishable from Event / Command connections.

@udidahan / @andreasohlund ?

johnsimons commented 10 years ago

Would it be possible to show us a never ending saga? So a very long time line ( > 9 mths) Would the timeline load as we scroll ? Would the timeline have gaps?

On 29 January 2014 08:56, Danny Cohen notifications@github.com wrote:

And here I was thinking about a mundane smaller-dots dotted line... It is a nice touch: clearly distinguishable from Event / Command connections.

@udidahan https://github.com/udidahan / @andreasohlundhttps://github.com/andreasohlund?

Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceInsight/issues/220#issuecomment-33531327 .

dannycohen commented 10 years ago

Would the timeline have gaps?

Yes, to indicate long period of inactivity. however, we're leaving this specific design for later.

Would the timeline load as we scroll ?

Nope. I would go for the gaps option.

Note that the (current) plan is to have the sequence being relevant for a single conversation (which may be extremely long or include many messages).

udidahan commented 10 years ago

I wouldn't have the data type passed in the timeout appear that way. Something more like this:

image

And if the user clicks the little arrow beside the timeout icon, then they could see the details about the data passed on.

udidahan commented 10 years ago

The idea of seeing a list of machines when hovering an endpoint is:

1) See all machines associated with that endpoint

2) Which machines are involved with the specific conversation the diagram applies to (white)

3) Which machines are causing messages to fail (red)

4) Which machines are associated with that endpoint but are not part of the conversation at hand (grey)

The whole “how should a scaled out endpoint be represented” question is too far reaching for us to deal with it now.

Once we get the basics of this screen working well, we can decide whether it’s worthwhile to revisit this.

When hovering on a specific host, the messages related to it (sent by, received and/or processed) are highlighted.

In the example screen (no. 04), Ecommerce@machine003 is highlighted and shows which message is associated

with that specific endpoint/host combination.

Drop this as well.

sergioc commented 10 years ago

@udidahan

And if the user clicks the little arrow beside the timeout icon, then they could see the details about the data passed on.

The message BuyersRemorseIsOver is no longer visible in your suggestion. Is that on purpose?

udidahan commented 10 years ago

Yes, that is on purpose. The reason is that it is timeout data - not a message like any other.

dannycohen commented 10 years ago

Yes, that is on purpose. The reason is that it is timeout data - not a message like any other.

But the timeout message does have a type and a body and all the relevant headers.

My concern is that treating it differently on the UI level adds more noise with no benefit: It adds more visual language semantics the user needs to understand, instead of the standard message display.

I'd say we can choose a missle ground on the UI-level by showing the Message type and the context menu for more info - but change the icon to a timer (like the one your proposed. E.g. something like this:

image

udidahan commented 10 years ago

Timeouts are used for fundamentally different purposes than regular command/event behaviors ergo they should be represented with different visual language.

Also, that big horizontal line looks totally out of place on the diagram (to me).

dannycohen commented 10 years ago

@udidahan -

that big horizontal line looks totally out of place on the diagram (to me).

I agree. It is a bit strange.

Timeouts are used for fundamentally different purposes than regular command/event behaviors

What are they used for ? (let get a few scenarios into the discussion...)

SimonCropp commented 10 years ago

@udidahan

Timeouts are used for fundamentally different purposes than regular command/event behaviors

I dont agree with this. When teaching other people about NSB I have always said "just think of timeouts as normal messages". This is especially true since the change in the core to remove the "timeout callback with no message" feature.

In my mind the concept of "timeout" should be thought of as a different way of sending a message, in the same way that Send and Publish are. When the saga receives the message it treats it in the same way that it treats any other message. it does not care if the message came from a send, publish or timeout. it only cares about the message type and the data it contains.

Or have I been doing people an injustice by teaching them to think this way?

udidahan commented 10 years ago

@simoncropp While technically accurate, the important difference is context/boundaries.

When I’m sending commands/events, I’m communicating across a boundaries – talking to a separate responsibility. However, timeouts are a mechanism for me to talk to myself (in the future) – same responsibility in SRP terms.

Does that make sense?

johnsimons commented 10 years ago

Well, in that case where does bus.SendLocal fit?

On 31 January 2014 09:47, Udi Dahan notifications@github.com wrote:

@simoncropp While technically accurate, the important difference is context/boundaries.

When I'm sending commands/events, I'm communicating across a boundaries - talking to a separate responsibility. However, timeouts are a mechanism for me to talk to myself (in the future) - same responsibility in SRP terms.

Does that make sense?

Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceInsight/issues/220#issuecomment-33743211 .

dannycohen commented 10 years ago

@udidahan - Can you provide a few sample scenarios for the timeout uses you see ? I'd like to discuss them.

udidahan commented 10 years ago

@johnsimons People use SendLocal to route something to the same physical endpoint - it's equivalent to sending a command but with the "shortcut" of not wanting to manage additional endpoints.

udidahan commented 10 years ago

@dannycohen The timeout scenarios revolve around sagas - in essence its an object that is calling an internal method but at some point in time in the future (with all of the nice fault-tolerance we provide).

In the past, a saga could even pass a simple type like an int or string to itself via timeouts - I'd want that to work again.

dannycohen commented 10 years ago

@udidahan - that does not answer my question... why and for what purpose do users use Timeouts ?

Also - the examples I have seen are not of simple types but of classes. Is this an unsupported usage ? is this a fringe scenario ?

udidahan commented 10 years ago

Here's an example:

https://gist.github.com/udidahan/8729522

And I would have liked to have been able to use a simple int rather than having to create a "Percent" class around it.

dannycohen commented 10 years ago

@udidahan - So the scenario is that a timeout "entity" is used in the Saga to specify the logical flow of a business policy, as it changes through time.

That "entity" can be a class, that may have a complex set of properties or a simple type (which is also a type of a class).

The main open issue IMO is whether we should show the message type name like any other message or not. I have no objection to showing the curved connecting line, and I assume we all agree that a "timer" icon would be appropriate.

So the question is: Should we show the "entity" type like we do with any other message ?

I believe that the underlying question to that is: "does this entity serve a purpose or has any value in the logical flow of the process that the user is trying to understand ?"

andreasohlund commented 10 years ago

Here is the issue for supporting this: https://github.com/Particular/NServiceBus/issues/881

On Fri, Jan 31, 2014 at 11:27 AM, Danny Cohen notifications@github.comwrote:

@udidahan https://github.com/udidahan - So the scenario is that a timeout "entity" is used in the Saga to specify the logical flow of a business policy, as it changes through time.

That "entity" can be a class, that may have a complex set of properties or a simple type (which is also a type of a class).

The main open issue IMO is whether we should show the message type name like any other message or not. I have no objection to showing the curved connecting line, and I assume we all agree that a "timer" icon would be appropriate.

So the question is: Should we show the "entity" type like we do with any other message ?

I believe that the underlying question to that is: "does this entity serve a purpose or has any value in the logical flow of the process that the user is trying to understand ?"

Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceInsight/issues/220#issuecomment-33775316 .

udidahan commented 10 years ago

@dannycohen When looking at the message flow view across endpoints, the internal data of a timeout is less relevant. When zooming in and looking at the saga view, then having the data type be visible is much more important.

dannycohen commented 10 years ago

@udidahan -

Two questions:

Q1:

  1. We will not show Timeout details (neither type nor content) in the Sequence view (I am assuming that is what you mean by "message flow view across endpoints")
  2. We will show Timeout details in Saga View
  3. We currently show Timeout details in the Message Flow (but who knows what's going to be there not in the future...)

Is this correct ?

Q2:

Regarding:

When looking at the message flow view across endpoints, the internal data of a timeout is less relevant.

So what you are saying is that the content (or type) of a Timeout does not affect the logic & flow of messages (which is the purpose of Message Flow and Sequence view), and should therefore not be visible in these views.

Correct ?

udidahan commented 10 years ago
  1. We will not show Timeout details (neither type nor content) in the Sequence view

Correct.

  1. We will show Timeout details in Saga View

Good.

  1. We currently show Timeout details in the Message Flow

I’d remove it from there too.

So what you are saying is that the content (or type) of a Timeout does not affect the logic & flow of messages (which is the purpose of Message Flow and Sequence view), and should therefore not be visible in these views.

It’s not that the content/type of timeout does not affect the flow, but that it shouldn’t be considered part of the flow. Just like we do not show the state of database entities operated on by message handlers as a part of the Message Flow view, so too we should not show the timeouts.

From: Danny Cohen [mailto:notifications@github.com] Sent: Friday, January 31, 2014 4:21 PM To: Particular/ServiceInsight Cc: Udi Dahan Subject: Re: [ServiceInsight] Proposed design: Sequence & Timeline views (#220)

@udidahan https://github.com/udidahan -

Two questions:

Q1:

  1. We will not show Timeout details (neither type nor content) in the Sequence view (I am assuming that is what you mean by "message flow view across endpoints")
  2. We will show Timeout details in Saga View
  3. We currently show Timeout details in the Message Flow (but who knows what's going to be there not in the future...)

Is this correct ?

Q2:

Regarding:

When looking at the message flow view across endpoints, the internal data of a timeout is less relevant.

So what you are saying is that the content (or type) of a Timeout does not affect the logic & flow of messages (which is the purpose of Message Flow and Sequence view), and should therefore not be visible in these views.

Correct ?

— Reply to this email directly or view it on GitHub https://github.com/Particular/ServiceInsight/issues/220#issuecomment-33797218 . https://github.com/notifications/beacon/475886__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcwNjcxMDg2NCwiZGF0YSI6eyJpZCI6MjQxODc4NDF9fQ==--fc7c01b21c87b06daef42f4e9db237e472628e90.gif

dannycohen commented 10 years ago

@udidahan -

Regarding your answer to Q1 -

This approach suffers from the following drawbacks:

  1. It uses inconsistent visual language: Sometimes you show a timeout as an icon, sometimes as a message-like entity, sometimes not at all
  2. The displayed data in the views is partial and inconsistent:
    • For example, taking VideoStore as an example: trying to understand the message flow without seeing and knowing that the timeout is there because its "BuyersRemorseIsOver" does not help me - as the viewer of the Message Flow (or Sequence) to understand the logic.
    • This example repeats itself in the example you supplied above (https://github.com/Particular/ServiceInsight/issues/220#issuecomment-33774551) where there is not one but two timeouts. Being able to differentiate between them and understand their meaning based on their Type name (and also their content) is IMO a must to understanding the logic in Message Flow or Sequence.
  3. Given the above, if we do not show the details of the Timeout messages, we are creating an gap in the Message Flow / Sequence diagrams.
    • This gap will hinder the user from fully understanding the message / endpoint interactions, and the user will need to move back and forth, to and from Saga View, in order to fill that gap (or, even worse for the effectiveness of ServiceInsight - open the source code...)

Regarding your response to Q2

I totally agree with @SimonCropp who sees Timeout Messages like other messages. They are classes, they contain data and they affect flow & interaction logic.

Let me put it in another way:

udidahan commented 10 years ago

Let’s resolve this on a call then.

dannycohen commented 10 years ago

@indualagarsamy / @udidahan / @sergioc -

As we discussed -

We will start by implementing a minimal implementation, as follows:

  1. No timeline
    • i.e. no representation of the lrngth of time it takes to deliver / process a message; all will be at the same length
    • Timeline will be added in the future, following additional discussion / hands-on iteration
  2. Timeout messages will be represented using the Alarm clock icon (see https://github.com/Particular/ServiceInsight/issues/221#issuecomment-34064623)
  3. Timeout messages will not display the message Type (i.e. see visualization by Udi in https://github.com/Particular/ServiceInsight/issues/220#issuecomment-33587587)
  4. Timeout message interaction will be indicated by a line that is differently shaped than the Event (dotted) and Command (full) lines. Exact visual TBD, but the line visual proposed by @udidahan in https://github.com/Particular/ServiceInsight/issues/220#issuecomment-33587587 is agreed as a good starting point.
  5. The line indicating that a message was sent from another messages is driven by its TimeSent property
    • If the Timeline is not displayed (as will be the case starting out), the line connecting the sent message to the processing of the message that sent it will be at the end of the processing phase, not in the middle of it. (See image below)
    • If more than one message was sent at the same time (e.g. an Event was published and it has mltiple subscribers), all resulting messages (sent to different Endpoints) will have the same TimeSent event (assuming it was transnational) and it will appear as a single line existing the sending message, reaching every subscribing endpoint, with delivery and processing indicators extending from that single line. (see image below) .
Sent Message exiting a sending message

When timeline is Off - always at the end of the processing rectangle

image

More than one message was sent at the same time

i.e. Event with 3 subscribers:

image

// CC @joaquinjares , @HEskandari

sergioc commented 10 years ago

Update: https://www.dropbox.com/sh/389j0jbfswxt6y7/OEdP8_nESq

udidahan commented 10 years ago

I feel a dashed/dotted line would be more appropriate for the timeout. Also, I thought we wanted the little drop-down arrow beside the icon for getting more information, didn’t we?

indualagarsamy commented 10 years ago

agree with @udidahan on the dotted line for the timeout. On second thoughts about the little drop down arrow, can you not hover/click the timeout icon itself to get the same information? What's the little drop down going to add?

sergioc commented 10 years ago

Updated: https://www.dropbox.com/sh/389j0jbfswxt6y7/OEdP8_nESq

The arrow will indicate there's a menu accessible via the icon.

@udidahan AFAIK the timeout is a sort of command. With the dashed line it'll blur the meaning of the dashed line for events. Shouldn't we keep the line solid?

udidahan commented 10 years ago

A timeout could (kind of, but it’s something of a stretch) be thought of as a command to yourself.

The thing about the dashed line is that it indicates that there is a disconnect in time from one bit of processing to another, while a contiguous line gives the incorrect impression that something is happening all the way through IMO.

sergioc commented 10 years ago

Then perhaps a thin line:

serviceinsight sequence diagram5_01 sequence diagram normal

The fact that the line "bends" may already be enough to indicate that there's a disconnect in time or... bend.

indualagarsamy commented 10 years ago

@sergioc - thanks, I get it. @udidahan - I think the solid line (in sequence diagrams, signifies synchronous actions). Even commands in our world is asynchronous. Should we even use a solid line for commands? Isn't that confusing? I think we do have a differentiation for commands vs events with our icon. Isn't that sufficient?

udidahan commented 10 years ago

@indualagarsamy forget about UML - the way it an async call is shown is with a "hollow arrow". I think in an earlier version of UML it was done with a "half arrow". I never really cared for either of them.

dannycohen commented 10 years ago

@udidahan / @indualagarsamy - Lets start implementing on https://github.com/Particular/ServiceInsight/issues/220#issuecomment-34810180 and iterate on the implementation.

Changing the lines is a significant decision in the sense that it needs to be done on SM as well (and message flow, and potentially Saga View as well), so whatever you decide - consider the cost.

My 2c: UML intricacies are not something we should consider as canonical gospel we must bow to. The visual language of UML is old and far from intuitive or self-explanatory. See example on async calls below:

image

indualagarsamy commented 10 years ago

Sorry about the confusion. Yeah, i agree. My original thinking was incorrect.

udidahan commented 10 years ago

I still like the dashed line more than the solid one - time in the world of programming is most closely related to an event.

indualagarsamy commented 10 years ago

Yes. The dashed line indicates a gap in time. and IMO indicates the passage of time much better than a solid line.