Closed dannycohen closed 8 years ago
@johnsimons please go ahead with this one
@dannycohen - what is the associated issue in SP?
This is currently a bug affecting existing functionality in SI (https://github.com/Particular/ServiceInsight/issues/28)
There are plans for a post V1 SP indicator for SLR requests, but its a low prio in the backlog. Why ?
I thought we came to the conclusion that we can't supply this information because those headers are wiped by NSB. Also those headers are an internal implementation concern that can change. On top of it SLRs can be turned off by the user or/and completely custom replaced.
On Wednesday, November 6, 2013, Danny Cohen wrote:
This is currently a bug affecting existing functionality in SI ( Particular/ServiceInsight#28https://github.com/Particular/ServiceInsight/issues/28 )
There are plans for a post V1 SP indicator for SLR requests, but its a low prio in the backlog. Why ?
— Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceControl/issues/65#issuecomment-27866387 .
Regards John Simons NServiceBus
I thought we came to the conclusion that we can't supply this information because those headers are wiped by NSB.
I was under the impression the number of SLR's is wiped, but that we can check whether or nor SLR's occurred (i.e. a boolean parameter).
Also those headers are an internal implementation concern that can change.
The fact that retries occurr repeatedly may indicate the system is somewhat unstable. A growing trend of retries is similarly significant to Opie and team. Therefore, although it may be implemented as an internal feature, but it is of value and importance to users.
Correct, we can provide a boolean parameter
On Wed, Nov 6, 2013 at 2:20 PM, Danny Cohen notifications@github.comwrote:
I thought we came to the conclusion that we can't supply this information because those headers are wiped by NSB.
I was under the impression the number of SLR's is wipred, but that we cna check whether or nor SLR's occurred (i.e. a boolean parameter).
Also those headers are an internal implementation concern that can change.
The fact that retries occurr repeatedly may indicate the system is somewhat unstable. A growing trend of retries is similarly significant to Opie and team. Therefore, although it may be implemented as an internal feature, but it is of value and importance to users.
— Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceControl/issues/65#issuecomment-27872361 .
@dannycohen @andreasohlund - After a discussion with @johnsimons and @SimonCropp, rather than add a header which is a boolean to say if SLR was invoked it makes more sense to add a header which provides the total number of retries that occured, and this information will be available on messages that were successfully processed. So, by looking at the audit message and looking at the number of retries, we can identify problems.
This count will be the net total (of FLR + SLR) that occured.
Will that satisfy the requirement?
This will allow the users to sort by this column in descending order and see which messages get retried most :)
@indualagarsamy / @johnsimons / @andreasohlund -
Let me know.
What we are proposing is for us to have a new header that will contain the number of times that message was retried.
What we are proposing is for us to have a new header that will contain the number of times that message was retried.
There is such a header and it is actively removed in order to fix a bug (see https://github.com/Particular/ServiceInsight/issues/20#issuecomment-24104404).
Are you saying we can un-remove it ?
No, I'm proposing a completely new header :)
On 13 November 2013 18:38, Danny Cohen notifications@github.com wrote:
What we are proposing is for us to have a new header that will contain the number of times that message was retried.
There is such a header and it is actively removed in order to fix a bug (see Particular/ServiceInsight#20 (comment)https://github.com/Particular/ServiceInsight/issues/20#issuecomment-24104404 ).
Are you saying we can un-remove it ?
— Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceControl/issues/65#issuecomment-28372929 .
Regards John Simons NServiceBus
@johnsimons - Whatever makes you happy... :-)
@indualagarsamy / @andreasohlund / @johnsimons - guys, where are we on this ?
// CC @HEskandari
This turned out to be nontrivial, can we bump this?
On Fri, Dec 13, 2013 at 8:26 AM, Danny Cohen notifications@github.comwrote:
@indualagarsamy https://github.com/indualagarsamy / @andreasohlundhttps://github.com/andreasohlund/ @johnsimons https://github.com/johnsimons - guys, where are we on this ?
- This causes a bug in SI existing display. See Particular/ServiceInsight#28 (comment)https://github.com/Particular/ServiceInsight/issues/28#issuecomment-25346471
- We may want to hide the display to allow us to defer the retry header bug fix.
- Either way - we need to decide (once and for all) what is to be done and when.
// CC @HEskandari https://github.com/HEskandari
— Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceControl/issues/65#issuecomment-30491317 .
@indualagarsamy https://github.com/indualagarsamy can you add the details on the issues we found
On Fri, Dec 13, 2013 at 8:42 AM, Andreas Öhlund < andreas.ohlund@particular.net> wrote:
This turned out to be nontrivial, can we bump this?
On Fri, Dec 13, 2013 at 8:26 AM, Danny Cohen notifications@github.comwrote:
@indualagarsamy https://github.com/indualagarsamy / @andreasohlundhttps://github.com/andreasohlund/ @johnsimons https://github.com/johnsimons - guys, where are we on this ?
- This causes a bug in SI existing display. See Particular/ServiceInsight#28 (comment)https://github.com/Particular/ServiceInsight/issues/28#issuecomment-25346471
- We may want to hide the display to allow us to defer the retry header bug fix.
- Either way - we need to decide (once and for all) what is to be done and when.
// CC @HEskandari https://github.com/HEskandari
— Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceControl/issues/65#issuecomment-30491317 .
@andreasohlund -
can we bump this?
NP, but it would be good to record the diagnosis and the prognosis.
Indu has the details
In short: We need the new pipeline hooks in NServiceBus 4.4.0 to pull this off
On Fri, Dec 13, 2013 at 8:48 AM, Danny Cohen notifications@github.comwrote:
@andreasohlund https://github.com/andreasohlund -
can we bump this?
NP, but it would be good to record the diagnosis and the prognosis.
— Reply to this email directly or view it on GitHubhttps://github.com/Particular/ServiceControl/issues/65#issuecomment-30492040 .
In short: We need the new pipeline hooks in NServiceBus 4.4.0 to pull this off
I'm fine with that.
This turned out to be nontrivial, can we bump this?
What was the issue @indualagarsamy ?
Trying to add the header, we realized that there is no easy way to count the FLRs the way it stands now.
What is wrong with adding it in https://github.com/Particular/NServiceBus/blob/develop/src/NServiceBus.Core/Unicast/Transport/FirstLevelRetries.cs#L49 ?
Assigning to backlog, pending SC / SP roadmap alignment
It can be reopen if we decide to do it
There is another stakeholder here. As a salesperson for Particular Software, when talking to other departments at our larger customers, I would like to be able to tell them how many messages NServiceBus not only prevented from failing but handled those failures automatically without any manual intervention thus clarifying the value proposition versus the team using HTTP or rolling their own solution on top of a queue.
@udidahan - how would you prioritize it ? (post LM3, of course)
That's a question to be answered via the Trello board, taking into account all the other things we have for LM4.
I think the way to go here is that instead of a header we should create a plugin that hooks into the notifications, http://docs.particular.net/nservicebus/errors/subscribing-to-push-based-error-notifications. This way we can let the users set threshold for flr/slr per second and have the plugin notify SC when that happens?
Arguable this could be implemented and distributed as a custom check?
We could start by adding this as a sample to gauge interest?
@Particular/servicecontrol-maintainers I propose we close this one until we ready to tackle it?
@johnsimons if u close it how would distinguish it from all the other issues that are closed "as fixed"?
@SimonCropp no milestone associated
@johnsimons so u regularly go through the 268 closed issues with no milestone as part of your backlog pruning? https://github.com/Particular/ServiceControl/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed
Not sure what you are talking about @SimonCropp. If we close this issue and later this topic resurfaces, I am pretty sure we can find it and reopen.
I have a demo going that shows how to do this without a header and instead using a custom check to "alert" in SP when SLR rates go up. How about I get that recorded and we close this one as won't fix?
Still agree with @SimonCropp that we shouln't just close issues that we still might act on. Perhaps move it some where or add a bullet point to some more generic "crazy SC ideas" issue.
How about we discuss this in plat dev since this is relevant for all our repos?
Still think agree with @SimonCropp that we shouln't just close issues that we still might act on.
Agree.
we shouln't just close issues that we still might act on
In my view, closing a feature request does not imply that we will not be acting on, it just means that right now the focus is elsewhere. It just help us concentrate on issues/features that are currently important. Having this massive backlog of open issues IMO makes it difficult to figure out where we going. The way I see it, is open feature requests that we do not intend to work on (right now), are just clutering the system, and also sending the wrong message to user out there, the fact is that we have no idea when we are going to tackle any of these or even if we ever going to do it at all.
I have a demo going that shows how to do this without a header and instead using a custom check to "alert" in SP when SLR rates go up
@andreasohlund that would be great, looking forward to it.
@johnsimons
As @andreasohlund said let's have the discussion about do we or don't we close issues in platform dev.
yeah, let's discuss it. I don't think we should close issues, a closed issue is a lost one, IMO. If a huge backlog is putting pressure on us it's a tolling problem (and to be honest GitHUb issues suck) since we have no way (other than the waffle boards, that are not that much better) to prioritize and look only at the top N issues. Also given that we have no milestones set in stone priorities may change overtime and is very easy to lose track of closed issues that were closed because of lack of capacity or out of scope at that time.
@mauroservienti it is quite simple to keep track of feature requests that were closed. This query gives u all issues that are closed and have no associated milestone and are not bugs https://github.com/Particular/ServiceControl/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aclosed+no%3Amilestone+-label%3A%22Type%3A+Bug%22+
This is currently not prioritised to be done any time soon, I will close this one for now and if we ever go down this path we can reopen it
As Opie, I would like to be notified when there are many or increasing number of SLRs performed, so I can investigate further if this is an indication of system instability (and I will be able to take corrective actions before it deteriorates further).
Implementation proposal: SecondLevelRetriesPerformed : true|{false}
Note: In order to conserve header space, avoid sending the header when SLR were not performed (i.e. the very existence of the header is an indication of it being true; the value of false is optional).