Particular / ServicePulse

Production monitoring for distributed systems.
https://docs.particular.net/servicepulse/
Other
33 stars 27 forks source link

Implement Failed Message Retry and RetryAll #14

Closed dannycohen closed 11 years ago

dannycohen commented 11 years ago

As Opie, when failed message indicator is red and after taking corrective actions, I want to retry sending the failed message.

Visualization:

  1. In the Events list in the (Failed messages page - which is loaded by pressing the Failed Messages indicator), one or more alerts can be selected (by clicking on the checkbox in the first row of every event row)
  2. Clicking the retry button displays a verification dialog: "N failed message/s will be re-sent to their destination endpoint".
  3. After clicking "OK", the failed message are sent for retry and no longer appear in the events list (either dashboard or Event Management page)

Notes:

Demo / Acceptance tests:

Case 1:

  1. Run the Video Store sample (5 endpoints)
  2. Cause 5 messages to fail in the "Sales" endpoint, and 3 messages to fail in the "eCommerce" endpoint
  3. The Failed message indicator is red, and the number "8" appears below it
  4. Click on the Failed message indicator
  5. The Event Management page for Failed Messages is displayed and the events list is filtered to show only failed messages (i.e. 8 active failed messages events)
  6. Select 3 of the 8 failed messages (1 from the "Sales" and 2 from the "eCommerce" endpoints
  7. Click on the "Retry" button
  8. The events list now shows only 5 failed messages
  9. The Failed message indicator is red, and the number "5" appears below it

Case 2:

  1. Run the Video Store sample (5 endpoints)
  2. Cause 1 messages to fail _repeatedly_ in the "Sales" endpoint (i.e. the message will never be processed successfully)
  3. The Failed message indicator is red, and the number "1" appears below it
  4. Click on the Failed message indicator
  5. the Fialed messages page is displayed
  6. Click on the "Retry all" button
  7. The events list now shows only no failed message events, and the Failed message indicator is green
  8. Wait until the message fails again
  9. The Failed message indicator is red, and the number "1" appears below it
  10. Click on the Failed message indicator
  11. The events list in Event Management page is filtered to show only the one failed message events. Note it has the same Message Id.
dannycohen commented 11 years ago

Replaces https://github.com/Particular/ServiceControl/issues/49

indualagarsamy commented 11 years ago

@dannycohen - Is step 3 necessary? i.e. Clicking the retry button displays a verification dialog: "N failed message/s will be re-sent to their destination endpoint".

Also step 2 is no longer necessary, since we have a detailed page for failed messages.

dannycohen commented 11 years ago

Is step 3 necessary? i.e. Clicking the retry button displays a verification dialog: "N failed message/s will be re-sent to their destination endpoint".

Necessary - no. of value - yes. Lets leave it out for alpha.

Also step 2 is no longer necessary, since we have a detailed page for failed messages.

Agreed.

indualagarsamy commented 11 years ago

@dannycohen - Can you please update this user story to match our last discussion, i.e. separate detailed screen for error messages. I believe we have implemented Failed Message Retry. What do you think is still left on this user story?

dannycohen commented 11 years ago

@indualagarsamy - Updated. Can you confirm the behavior for case 2 ? (i.e. when does an event disappear from the failed messages list ? when you click on the "retry" button ? when you receive a "Retry requested succesfully" indication from SC ? )

dannycohen commented 11 years ago

@indualagarsamy -

Regarding removing events which were retried from the events list, my take is as follows:

  1. Since retry is inherently asyncronous, I see no reason to wait until we received a final confirmation that the retry requested is being executed.
  2. An indication that the retry request was queued is good enough, especially for beta.
  3. Therefore, I say that when the user clicks on the "Retry" button and you make the call to the SC HTTP API, and receive an HTTP 200, we consider that as retry request queued, and hide the events which were selected for retry
    • As for the retried messages, hiding them is one option; another is graying them; I'd say its good enough for Beta (and probably after that as well) to hide the messages. SI can be used for monitoring retry requests and messages after submittion of retry request

Make sense ?

// @johnsimons

indualagarsamy commented 11 years ago

Since this feature is implemented, going to close this issue: Going to open separate issues for any bugs related to this feature.